CAP Theorem

by

Duct Tape Programmer


According to CAP theorem, a distributed software system cannot simultaneously provide more than two out of three of the following guarantees (CAP):

  1. Consistency: All nodes see the same data at the same time. Consistency is achieved by updating several nodes before allowing further reads.
  2. Availability: Every request gets a response on success/failure. Availability is achieved by replicating the data across different servers.
  3. Partition tolerance: System continues to work despite message loss or partial failure. A system that is partition-tolerant can sustain any amount of network failure that doesn’t result in a failure of the entire network. Data is sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages.

Interesting Read: You Can’t Sacrifice Partition Tolerance

  • According to the above article a distributed system must require partition tolerance since to not require it the system will need to run on a network which is guranteed to never drop a message and such a network does not exist (at least as of now).

  • Choosing between Consistency and Availability:

    • Choosing Consistency Over Availability:

    If a system chooses to provide Consistency over Availability in the presence of partitions (again, read: failures), it will preserve the guarantees of its atomic reads and writes by refusing to respond to some requests. It may decide to shut down entirely (like the clients of a single-node data store), refuse writes (like Two-Phase Commit), or only respond to reads and writes for pieces of data whose “master” node is inside the partition component (like Membase).

    • Choosing Availability Over Consistency:

    If a system chooses to provide Availability over Consistency in the presence of partitions (all together now: failures), it will respond to all requests, potentially returning stale reads and accepting conflicting writes. These inconsistencies are often resolved via causal ordering mechanisms like vector clocks and application-specific conflict resolution procedures. (Dynamo systems usually offer both of these; Cassandra’s hard-coded Last-Writer-Wins conflict resolution being the main exception.)

  • But Never Both:

    You cannot, however, choose both consistency and availability in a distributed system. As a thought experiment, imagine a distributed system which keeps track of a single piece of data using three nodes—AA, BB, and CC—and which claims to be both consistent and available in the face of network partitions. Misfortune strikes, and that system is partitioned into two components: {A,B}{A,B} and {C}{C}. In this state, a write request arrives at node CC to update the single piece of data.
    That node only has two options:
    Accept the write, knowing that neither AA nor BB will know about this new data until the partition heals.
    Refuse the write, knowing that the client might not be able to contact AA or BB until the partition heals.

comments powered by Disqus