Packet reordering: The key to efficient high-speed packet processing

August 19, 2022

Packet reordering: The key to efficient high-speed packet processing

Network data rates are expected to increase tremendously thanks to capabilities opened up by 5G networks, the evolution to 5G Advanced and later 6G and the availability of multi-hundred-gigabit network equipment. This transformation toward faster network speeds will make realizing low-latency Internet services more challenging. Web servers, data storage and network functions virtualization (NFV) servers demand the performance of many hardware and software optimizations to be able to keep up with the higher request and packet rates.

Tackling these challenges requires paying more attention to fast cache memories available on commodity hardware and exploiting them better. Previous blog posts showed that performing careful cache-aware memory management and whole-stack software optimizations could significantly improve the performance of high-speed NFV service chains, one of the fundamental building blocks of the Internet.

In continued joint research, Ericsson and KTH Royal Institute of Technology further investigated the correlation between the network traffic order and the benefits of cache memories.

Optimal utilization of cache memories happens when packets requiring the same type of processing (for example packets belonging to the same flow) arrive as close as possible in time and space to each other. That way, those packets access the same set of instructions and data before they get evicted from the cache memories.

Real-world traffic, however, typically causes packets requiring the same type of processing to be interleaved by other packets.

  • For instance, packets belonging to a client’s flow often get spaced apart due to multiplexing of different traffic flows on the Internet.
  • Additionally, recent congestion control mechanisms also favor spreading per-flow packets to minimize network congestion.

Therefore, real-world traffic often lacks the preferred traffic locality, which is desirable for maximizing the benefits of cache memories. In fact, our analysis of a KTH campus trace demonstrates that more than 60 percent of the packets belonging to the same flow are interleaved with packets belonging to other flows, which is far from ideal conditions for cache-optimized packet processing.

To overcome this issue, we built a system that reorders packets within a small time frame to increase traffic locality. In more detail, it deinterleaves per-flow packets and builds batches of packets belonging to the same flow. Our research collaboration resulted in the conference publication Packet Order Matters! Improving Application Performance by Deliberately Delaying Packets, which received the community award at the NSDI ‘22 for sharing software and reproducible results