Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm
All real world networks do tend to fail every now and then. Failures are common, however, resilience of distributed systems to partitions is not nearly ubiquitous enough. Even though partition tolerance is an integral part of Brewer's CAP theorem, distributed systems, even the ones slated to fulfill the 'P' in CAP, fail to meet it to desired levels or deterministic enough. This, unfortunately, is not an exception, but has become more commonplace as the recent research shows [1].
Galera [2] adds synchronous replication to MySQL/PXC (Percona XtraDB Cluster)[3] through wsrep replication API. Synchronous replication, requires not just a quorum but a consensus. In a noisy environment, a delayed consensus can delay a commit, thus add significant latency or even partition the network into multiple non-primary components and single primary component (if plausible). Hence, building resilience is not an expectation but a fundamental requirement.
This talk is about testing partition immunity of Galera. Docker and Netem/tc (traffic control) are used prominently here. Netem is important to simulate real-world failure events - packet loss, delay, corruption, duplication, reordering et.al., and to model real-world failure distributions like pareto, paretonormal, uniform etc. Docker, or containers in general, are essential to simulate multiple nodes which can be built at runtime, brought up easily, tore down, their networks and flows altered elegantly when and then required, quick horizontal scaling; performance is also kept in mind when choosing containers over full virtualization and others. Sysbench OLTP is used for load generation, though, RQG (random query generator) can also be used here for advanced fuzz testing.
Salient observations discussed will be:
* Application of WAN segment-aware loss coefficents to virtual network interfaces.
* Varying reconciliation periods after network noise is withdrawn.
* Multi-node loss and short-lived noise burst visa-vis single-node loss and longer noise envelope.
* Full-duplex linking of containers with dnsmasq.
* Effects of non-network actors like slow/fast disks on fsync.
* Round-robin request distribution to nodes with/without the nodes with network failures in chain.
* Pre and post-testing sanity tests.
* Log collection and analysis.
* Horizontal scaling of test nodes and issues with Docker/namespace.
To conclude, all the ins-and-outs of partition tolerance testing with Docker and Netem for Galera will be discussed. Other similar tools/frameworks like jespen [4] will also be discussed. Comparison of Galera/EVS (extended virtual synchrony) to other consensus protocols like Paxos, Raft etc. will also be made. Results of testing - addition of auto_evict to Galera - will also be highlighted at the end.
[1] https://queue.acm.org/detail.cfm?id=2655736
[2] http://galeracluster.com/products/
[3] http://www.percona.com/software/percona-xtradb-cluster
[4] https://github.com/aphyr/jepsen