HPACK: Header Compression format for HTTP/2

Uddeshya Singh
Geek Culture
Published in
9 min readJul 11, 2022

--

Recently, I have been exploring what makes HTTP/2 and HTTP/3 spookily faster than their 1.1 predecessors. While it’s been majorly clear that specific features like stream multiplexing are at the heart of it, there is a certain unspoken hero which we will be talking about in this post.

HPACK is the header compression technique used in HTTP/2, its counterpart for HTTP/3, QPACK is slightly different in terms of how it handles the dynamic tables (Don’t worry, I will be explaining all that in detail in the latter part of the blog).

Disclaimer : This blogpost is more or less a summary of RFC with some additional illustrations on top of it. Feel free to read the HPACK RFC[1] directly or the summary provided by Cloudflare[2] on the topic of how static + dynamic tables help in compression process. This post will be focussing on how header fields are represented across the wire in header field representations and how primitive data types are communicated amongst the headers and decoders.

Compression Process ⏮

An interesting thing to note in the RFC is that there is no concrete algorithm specification, hence giving a free hand to the implementations to figure out their own encoding process as long as they ensure the following properties during their operation —

  • An encoder must preserve the order of the header field representation inside the Header Block as received in the original header list and ensure the same order is followed by the decoder.
  • Decompression of header blocks should be dependent only on the dynamic table as an additional decoding context.
  • An encoder must maintain a couple of tables to associate header fields to indexes.
    - Static Table: Static Table is a table consisting of predefined headers fields like accept , allow , authorization etc and pseudo headers like :scheme and :status . You can check out the whole list here[3].
    - Dynamic Table: This is a dynamic table of header fields encountered in a FIFO order and indexed accordingly. It’s a strict memory-bound resource that is mutually agreed upon by the decoder and encoder.
  • In the header field representation, the header field name can be represented as is or can be referenced to an entry inside either of the above-mentioned tables while the header field values are represented literally (The literal representation of both value and field could be either direct or Huffman encoded)

Dynamic Tables 🏗

Dynamic tables are definitely worth elaborating about, these tables are initially empty and entries are added in first-in, first-out order. The newest entry in this table is in the lowest order while the oldest one will be in the highest order. Old entries are evicted in cases of size change command or to accommodate newer header fields.

It can contain duplicate entries and encoders will be deciding on how much memory is to be used while updating/creating the dynamic table.

Data Representation 📑

Integers

In HTTP protocols in general, integers are used to represent string lengths, or in this specific case, header field indexes. As per the RFC, integer representation can start anywhere within the octet but for optimized processing, each representation should finish at the end of an octet.

What precisely does it mean when we say “start anywhere in the middle”? As we know an octet has 8 bits, the protocol allows an implementation to start an integer at, let’s say, 2nd bit and the 0th and 1st bit can be empty (or can be set in any way) in the prefix.

Prefix Octet Example

An integer is represented by the prefix that fills the current octet + additional octets in case the integer is > 2 ^ N — 1

Easy enough no? Let’s try going through a couple of examples to understand the same.

Example- Encode 45 with N = 6

Since 45 < 63 ( 2^6–1), we could encode 45 into a binary where the first bit started from the 2nd index. This is what it means to “Start from anywhere”. Had N = 8 was set, we could start the integer from the first bit itself.

Example- Encode 458963 with N=6

Over here, as you can see, we are using 4 octets to encode such a large number. (1 Prefix, 3 additional). The interesting thing to notice here is, that the MSB (Most Significant Bits) of additional octets have all been set to 1 except the final octet.

The general pseudocode to encode an integer I on N bits is as follows :

String Literal Representations

In the tables and header representations, the Header field names and values can be represented as string literals. It’s encoded as a sequence of octets and includes a flag that mentions whether the string data is the string’s literal octets or actually, Huffman encoded octets.

Here, the H bit represents whether the data is Huffman Encoded or not, while StringLength as the name suggests, presents how long is the string octet’s length. Elaborating on a single-string literal example will not make sense until we go through how are these literals represented in the header fields, then, we can look at a couple of examples given in the Appendix [4].

Header Field Representations 💠

Detailed in RFC under “Binary Format”, we will now peek into how do these literal representations actually work. With different header field representations, one encoder implementation can give instructions to the decoder on what the representation actually means and whether should it save it into dynamic table or not, and should the size of the dynamic table be changed or not.

Indexed Header Field Representation

An Indexed header field starts with 1 1-bit patter and the rest of the octet is filled by the index of the matching header field (Yes, you guessed it right, it’s an integer with N=7). This representations identifies an entry in either the static or the dynamic table.

Literal Header Field Representations

A literal header field representation contains a literal header value. This could be either a string literal or a reference to an existing table entry (either static or dynamic doesn’t matter.)

Also, let’s abbreviate Literal Header Field to LHF for the time being (yes, I am lazy)

LHF with incremental indexing

This results in appending the header field to the “Decoded header list” and inserting it as a new entry into the dynamic table.

Literal Header Field with incremental indexing

Its binary representations start with a 01 2-bit pattern. If the header field name matches the header field name of an entry in a static/dynamic table, then the encoder can simply refer to the index of that entry and the index will be represented as an integer with a 6-bit prefix.

Otherwise, that prefix octet is set to 0 and 2 string values are sent each representing header field name and header field value, and it's later on stored inside the dynamic table upon successful decoding.

LHF without indexing

This results in appending the header field to the “Decoded header list” without any changes in the dynamic table.

Its binary representation starts with the 0000 4-bit pattern and the rest of the rules are similar to the ones you just read above for LHF with indexing so I’ll not be repeating them here.

Dynamic Table Size Update

It’s a simple octet starting with the 001 3-bit pattern followed by a new maximum size represented as an integer with N = 5. This indicates a size update to change the size of the dynamic table. By the way, it must be lower or equal to limit determined by HTTP/2 protocol during connection phase in the SETTINGS_HEADER_TABLE_SIZE parameter.

Header field representation implementation ✍️

Now, let’s visit an example, shall we? I’ll be picking the example given in appendix C.2.1 [5], which talks about the case of Literal Header Fields with incremental indexing in the scenario where header field was not present in any table beforehand.

We need to monitor how the decoder counterpart process the hexdump given in the picture. Let’s take it one by one, if you decode the 40 into an octet, it’ll be 01000000 which means it’s a command to the dynamic table in the decoder to index whatever it decodes and that there is no reference to any existing indexed value.0a is converted to 0000 1010 which means the following string is non Huffman encoded and is of length 10, then 63 75 73 74 6F 6D 2D 6B 65 79 corresponds to the header field name.

The following 0d is converted to 0000 1101 which again shows that the following string is Non-Huffman encoded and is of length 13, then 63 75 73 74 6F 6D 2D 68 65 61 64 65 72 corresponds to the header field value.

After this operation is completed, the dynamic table inside the decoder will have [1]custom-key: custom-header as a key value pair which it will be able to reference later on. I’d highly recommend this Cloudflare’s blog [2] to understand how is this key value pair later utilized.

But how much space is saved? 🤔

The entire point of this compression and caching tactic was to save space right? Let’s try and find out how much space is saved using a counter request, let’s assume the above example was to be encoded when the entry custom-key was already present in the dynamic table.

When the header is in dynamic table

3E , that’s all encoder needs to put in, this will be decoded as 1 0111110 which means this is an indexed header field with an index number 63 (Why 63? Because there are 62 static unmodifiable header list and dynamic header indexes start just after that).

So, in the first request, there were 16 * 13 = 208 bits used while in the second request with the same header only 8 bits were used. In this particular example, the subsequent cached request used only ~3.8% of memory initially required over the wire. However, the following conditions were assumed:

  1. The dynamic table does not drop the index
  2. Encoder uses the indexed representation instead of adding/duplicating the key-value pair.
  3. The length of the actual header was quite large, hence replacing it with an index made it efficient to communicate it across the wire.
  4. There is enough memory agreed upon between the decoder and encoder actually to add this key value pair into the dynamic table.

Conclusion

With this example, I can conclude that there is a 96% compression if the initial entry was non Huffman encoded and other conditions were ideal but this is a one-off example to particularly show how this calculation works, Cloudflare's blog[2] gives a detailed insight on a much larger sample set they have noticed a 76% compression in ingress headers. This was a fun-to-understand algorithm, and feel free to point out anything inaccurate in the post.

Until next time :D

Resources

--

--

Uddeshya Singh
Geek Culture

Software Engineer @Gojek | GSoC’19 @fossasia | Loves distributed systems, football, anime and good coffee. Writes sometimes, reads all the time.