Scalable and Low Latency Lock-free Data Structures

Audience:
Topic:

Imagine that your program uses many threads, which insert and lookup millions times per second in a large data structure like std::map or std::unordered_map. Typically, you have to switch to a lock-free data structure for this task. Lock-free approaches perfectly scale data structures for multi-core systems, but hash tables and trees need some reorganization as more and more items are inserted and these reorganizations are hard to make lock-free. But the problem isn't only in contention and we also need to efficiently work with memory to develop a high performance data structure for the modern hardware. Cache conscious data structures address the problem by efficient usage of CPU caches and reducing the number of accesses to the main memory.

This talk makes a quick survey over several lock-free (primarily variations of hash table) and cache conscious (mostly trees) data structures. We'll discuss the best use cases for the data structures and their limitations. Besides performance in average cases, we'll also focus on worst case scenarios, which may introduce high tail latency on large busy systems. Tail, or high percentile, latency is a severe problem since it may reach significant values and the small percent of problem cases can be not so small in absolute values, e.g. if you service 1M users, then only 0.1% of them experiencing high processing time is a problem.

The main part of the talk is a step by step C++ implementation of Hash Trie, a hybrid lock-free cache conscious data structure, which provides good access time in average and worst case. We'll do microbenchmarks of the data structure and compare it with other data structures.

You'll learn about:
* when standard containers and locking mechanisms aren't enough
* several advanced data structures: split ordered lists and other variations of lock-free hash tables, tries (partricia trees) and hybrid data structures
* x86-64 memory ordering and cache hierarchy, operating system preemption and how to employ all the knowledge to implement a very fast data structure
* gotchas of data structures benchmarking, such as keys distribution, latency vs throughput, worst cases and so on
* an open source lock-free cache conscius Hash Trie implementation

Time:
Saturday, October 26, 2024 - 04:45