dockerfile/examples/openssl/openssl-3.2.1-src/doc/designs/quic-design/quic-thread-assist.md

4.9 KiB

QUIC Thread Assisted Mode Synchronisation Requirements

In thread assisted mode, we create a background thread to ensure that periodic QUIC processing is handled in a timely fashion regardless of whether an application is frequently calling (or blocked in) SSL API I/O functions.

Part of the QUIC state comprises the TLS handshake layer. However, synchronising access to this is extremely difficult.

At first glance, one could synchronise handshake layer public APIs by locking a per-connection mutex for the duration of any public API call which we forward to the handshake layer. Since we forward a very large number of APIs to the handshake layer, this would require a very large number of code changes to add the locking to every single public HL-related API call.

However, on second glance, this does not even solve the problem, as applications existing usage of the HL APIs assumes exclusive access, and thus consistency over multiple API calls. For example:

x = SSL_get_foo(s);
/* application mutates x */
SSL_set_foo(s, x);

For locking of API calls the lock would only be held for the separate get and set calls, but the combination of the two would not be safe if the assist thread can process some event which causes mutation of foo.

As such, there are really only three possible solutions:

  • 1. Application-controlled explicit locking.

    We would offer something like SSL_lock() and SSL_unlock(). An application performing a single HL API call, or a sequence of related HL calls, would be required to take the lock. As a special exemption, an application is not required to take the lock prior to connection (specifically, prior to the instantiation of a QUIC channel and consequent assist thread creation).

    The key disadvantage here is that it requires more API changes on the application side, although since most HL API calls made by an application probably happen prior to initiating a connection, things may not be that bad. It would also only be required for applications which want to use thread assisted mode.

    Pro: Most “robust” solution in terms of HL evolution.

    Con: API changes.

  • 2. Handshake layer always belongs to the application thread.

    In this model, the handshake layer “belongs” to the application thread and the assist thread is never allowed to touch it:

    • SSL_tick() (or another I/O function) called by the application fully services the connection.

    • The assist thread performs a reduced tick operation which does everything except servicing the crypto stream, or any other events we may define in future which would be processed by the handshake layer.

    • This is rather hacky but should work adequately. When using TLS 1.3 as the handshake layer, the only thing we actually need to worry about servicing after handshake completion is the New Session Ticket message, which doesn't need to be acknowledged and isn't “urgent”. The other post-handshake messages used by TLS 1.3 aren't relevant to QUIC TLS:

      • Post-handshake authentication is not allowed;

      • Key update uses a separate, QUIC-specific method;

      • TLS alerts are signalled via CONNECTION_CLOSE frames rather than the TLS 1.3 Alert message; thus if a peer's HL does raise an alert after handshake completion (which would in itself be highly unusual), we simply receive a CONNECTION_CLOSE frame and process it normally.

    Thus so long as we don't expect our own TLS implementation to spontaneously generate alerts or New Session Ticket messages after handshake completion, this should work.

    Pro: No API changes.

    Con: Somewhat hacky solution.

  • 3. Handshake layer belongs to the assist thread after connection begins.

    In this model, the application may make handshake layer calls freely prior to connecting, but after that, ownership of the HL is transferred to the assist thread and may not be touched further. We would need to block all API calls which would forward to the HL after connection commences (specifically, after the QUIC channel is instantiated).

    Con: Many applications probably expect to be able to query the HL after connection. We could selectively enable some important post-handshake HL calls by specially implementing synchronised forwarders, but doing this in the general case runs into the same issues as option 1 above. We could only enable APIs we think have safe semantics here; e.g. implement only getters and not setters, focus on APIs which return data which doesn't change after connection. The work required is proportional to the number of APIs to be enabled. Some APIs may not have ways to indicate failure; for such APIs which we don't implement for thread assisted post-handshake QUIC, we would essentially return incorrect data here.

Option 2 has been chosen as the basis for implementation.