The Birth of an Apilator is part of a case study in modern client-server programming for the web. Part II focuses on high-performance networked I/O and on common HTTP issues with their proper solutions. Get and try Apilator yourself here!
NIO, TCP and Friends
Writing a TCP server with NIO in Java is neither complex task, nor too novel one. We stick to the classic approach with one worker thread responsible for reading/writing to the socket and multiple worker threads, each picking up next request when it becomes available. While at TCP layer this is easy, putting a nearly full-featured HTTP on top of it presents several major challenges to overcome.
Most of the scenarios found in popular tutorials and books deal only with the simplest case where a GET request is sent and served. The GET method has the advantages that it has no body (only headers) and in most cases the request fits into a single packet, hence a single read from the network stream is enough to get the full request. In contrast, a POST request with some upload will usually span across number of network packets; because of the way NIO works, each read from the network channel will only yield one piece of the request, leaving the server with two issues: to properly assemble the prices and to process them the request is fully sent. While the former is relatively easy using attachments in channel keys in Java, the latter is much more problematic.
The HTTP Bumps
Historically, TCP does not provide a clear mechanism for the initiating side (client) to inform the server that it had completed the transmission. This is left entirely to higher-level protocols. Oddly enough, HTTP also does not provide such mechanism. In fact, TCP does have a workaround which, if implemented mutually by the client and server, will work fine: when the client is done sending, it half-closes the socket effectively letting the server know it s done. The major drawback here is that the TCP session cannot be reuse, i.e. after the server responds it has to fully close the socket. Since we need to support regular browsers as clients, we need to support HTTP/1.1; and this means we need to support Keep-Alive, which is actually designed (and heavily used nowadays) to reuse TCP sessions for multiple HTTP exchanges. Lacking the ability to use TCP-based signalling for end-of-transmission and missing the same functionality in HTTP means we have no choice, but attempt processing a potentially incomplete request every time a new read comes from the client socket.
Such processing will definitely cause computing overhead, so we need to be able to do it as efficiently as possible, because it is easy to deduce that for a POST request that arrives in 10 chunks our first 9 processing attempts will be incomplete. To achieve success we employ the following strategy:
- Try parsing full headers. Headers are text only, one per line and their end is signified by a blank line. If we cannot read a blank like in the input, it is still incomplete and we should wait for more to come.
- If we have full headers, then read the value of Content-Length header – it will tell us how many bytes will be coming, but only after the header. So we also count the size of the header and apply the simple math: if size of header + size of Content-Length is more than we have received so far, wait for more input to arrive.
As the HTTP headers are plain text, parsing them is not a complex task. Headers are parsed into a hash table with the header name as key (lowercase conversion is applied, because the HTTP standard says they are case insensitive). Next a check for cookies is performed – if the Cookies header (i.e. hash table key) is found, we separate the supplied cookies (i.e. the corresponding hash table value) into name/value pairs and provide them as a separate hash table.
In case of a POST request, we can move to processing the POST body only after we have received all the required data (i.e. both above tests give us OK). If the data is send as x-www-urlencoded, its processing is as easy as splitting into name/value pairs and then building a hash table;; for multipart/form-data, however, we face another major challenge, because:
- The body may consist of arbitrary number of sections and their count is unknown to us, and
- Each section may be of arbitrary size which is also unspecified, and
- The section body may be either text or binary with only the CR/LF + boundary separator available to use as delimiter, and
- The boundary, specified in the HTTP header, may be surrounded by arbitrary number of any characters on the same line.
The solution we implement is to run in parallel two readers from the same input – one text and one binary. We read data from the text reader until we encounter a section which denotes binary content (if we do not, all the data is read only from the text reader) – normally this is done with specifying Content-Transfer-Type as “binary” or “8-bit” in the section headers, but as another inconsistency we observe that many browsers like Google Chrome and Mozilla Firefox send the data in binary without specifying Content-Transfer-Encoding at all, thus violating RFC 2045. At the beginning of the binary content we skip the number of bytes read so far from the beginning of the binary reader and read all remaining bytes into a temporary buffer. Inside this temporary buffer we perform a binary seek for the first occurrence of the boundary using an optimised search algorithm, the Knuth-Morris-Pratt Algorithm for pattern matching. When we know this offset, we again copy the bytes between the starting place of our previous read and the offset to a second temporary buffer. This buffer now contains all our data, but also a terminating CR/LF, followed by any number of arbitrary bytes that were before the boundary. To remove them, we reverse this second buffer and perform a binary search for the LF/CR bytes (because the binary data may contain legit CR/LF sequences that are part of the uploaded file and we only want to find the last one which was added as binary data terminator). After we know this second offset, we can calculate the actual ending position of the uploaded binary data and can now perform an actual read of the exact number of bytes from the binary reader. This approach is repeated until all segments of the form are exhausted.
Having the Workers Work It
After the HTTP processing completes, the worker thread in the NIO server builds two objects, one filled with the input data and one empty, to be filled with the output data. It then checks the local part of the original URL to determine whether this is an API request or a request for static content. The assumption is that if the local part matches a certain configurable pattern, this is a request for static content; the remainder of the local part of the URL is mapped to a predefined document root and the retrieved content is served. Any other request is considered an API request.
When an API request is detected, the local part of the URL is again approached and the string between the first and the second slash is extracted. It is then used as the name of the API endpoint to call. Once the API class is instantiated, the worker calls the method show name matches the HTTP Method (converted to lower case); thus, for a GET method in the HTTP request, the .get() method will be called.
To give the end-user maximum flexibility to build the API, it resides in a separate package from the TCP server and HTTP parser. This means that the worker only knows the name of the class to instantiate at run time, but not at compile time. The worker then uses reflection to be able to instantiate the proper class.
Once the API method returns, the worker calls a predefined method in the reflected API class, e.g. named .onCompletion(). This method returns the output object filled with the proper data. The worker then builds a set of outbound HTTP headers, calculates the Content-Length, adds the just retrieved body and pushes the resulting byte buffer back to the main thread, which picks it up and sends it to the client.