CAP TheoremSystem DesignDistributed SystemsDatabases

CAP Theorem Explained

July 4, 202610 min read

The CAP theorem states that a distributed system can guarantee at most two of three properties: Consistency (every read returns the latest write), Availability (every request gets a response), and Partition tolerance (the system works despite network failures between nodes). When a network partition occurs — and in distributed systems, they always eventually do — you must choose between consistency and availability.

This is not an abstract academic exercise. MongoDB defaults to CP (consistent but may reject writes during partition). Cassandra defaults to AP (always available but may return stale data). PostgreSQL on a single server is CA until you add cross-region replication. Understanding CAP helps you pick the right database, design failover correctly, and answer system design interview questions with confidence.

CAP theorem: pick two of three

Consistency, Availability, and Partition Tolerance

Consistency in CAP means linearizability — after a write completes, every subsequent read from any node returns that value. This is stronger than eventual consistency, where replicas may temporarily disagree. Availability means every non-failing node returns a response in a reasonable time — no timeouts, no errors due to the partition itself. Partition tolerance means the system continues operating when network links between nodes break.

In practice, partition tolerance is not optional for distributed systems. Networks fail, switches reboot, and availability zones go down. So the real trade-off is CP vs AP: during a partition, do you reject requests to preserve consistency, or accept them and reconcile later?

CAP theorem: pick two of three

Quick reference

Consistency (CAP): all nodes see the same data at the same time — not the same as ACID consistency.
Availability: every request to a non-failing node gets a response — no guarantee it is the latest data.
Partition tolerance: system operates despite arbitrary message loss between nodes.
Network partitions are inevitable in distributed systems — design for them, do not assume they won't happen.
PACELC extends CAP: if Partition, choose A or C; Else, choose Latency or Consistency.

Remember this

Partition tolerance is mandatory in distributed systems — the real choice is consistency vs availability during a split.

CP Systems (Consistency + Partition Tolerance)

CP systems prioritize consistency over availability during a partition. When nodes cannot communicate, the minority partition stops accepting writes (or reads) to prevent divergent state. MongoDB with majority write concern, HBase, etcd, and ZooKeeper all lean CP. etcd is used by Kubernetes for exactly this reason — you want consistent cluster state, even if it means brief unavailability during leader election.

The cost is availability. During a partition, clients on the minority side get errors. For a banking ledger or inventory system, this is acceptable — returning stale balance data is worse than a temporary error. Design CP systems with retry logic and clear error messages so clients know to retry after the partition heals.

CP systems: consistent but may reject requests during partition

Quick reference

Examples: MongoDB (with majority writes), HBase, etcd, ZooKeeper, Consul.
During partition: minority partition rejects writes to prevent split-brain.
Best for: financial transactions, inventory, configuration stores, leader election.
Use quorum-based writes (majority of nodes must acknowledge) for strong consistency.
Clients need retry logic — CP systems return errors during partitions by design.

Remember this

Choose CP when stale or conflicting data is unacceptable — financial systems, config stores, and coordination services.

AP Systems (Availability + Partition Tolerance)

AP systems prioritize availability during a partition. Both sides continue accepting reads and writes, even if they cannot sync. When the partition heals, replicas reconcile using conflict resolution — last-write-wins, vector clocks, or application-level merge logic. Cassandra, DynamoDB, CouchDB, and Riak are AP systems designed for high write throughput across geographically distributed nodes.

The cost is consistency. A user might read their own stale write if the request hits a replica that has not yet received the update. For a social media feed, shopping cart, or analytics counter, this is usually fine. Design AP systems with idempotent writes, version vectors, and application-level conflict resolution rather than assuming the database will handle it magically.

AP systems: always available but may return stale data

Quick reference

Examples: Cassandra, DynamoDB, CouchDB, Riak, DNS.
During partition: all nodes accept requests — replicas diverge temporarily.
Best for: social feeds, shopping carts, analytics, session stores, DNS.
Use eventual consistency with tunable consistency levels (e.g. Cassandra QUORUM).
Implement conflict resolution: last-write-wins, CRDTs, or application merge logic.

Remember this

Choose AP when uptime matters more than immediate consistency — social apps, carts, and analytics pipelines.

CA Systems and the Real World

CA systems are consistent and available but not partition tolerant — they assume the network never fails. A single-node PostgreSQL or MySQL instance is CA: ACID transactions, always available, no network between nodes to partition. The moment you add synchronous replication to a standby in another data center, you have a distributed system and must confront CAP.

Most production databases offer tunable consistency. PostgreSQL with synchronous replication is CP. PostgreSQL with asynchronous replication is closer to AP for reads. DynamoDB lets you choose strong or eventual consistency per request. The CAP theorem is a lens for understanding trade-offs, not a rigid classification — real systems sit on a spectrum.

CA systems: consistent + available on a single node (no partition tolerance)

Quick reference

Single-node RDBMS (PostgreSQL, MySQL) is CA until you add replication.
Adding read replicas introduces eventual consistency for reads.
Multi-region active-active always requires AP or CP trade-offs.
DynamoDB: strong consistency per request or eventual (default).
Design for the failure mode you cannot tolerate — not the one that is easiest to implement.

Remember this

Single-node databases are CA; the moment you distribute, CAP trade-offs become unavoidable — tune consistency to your use case.

Key takeaway

CAP theorem is a decision framework, not a rulebook. Every distributed database makes different trade-offs, and most let you tune consistency per operation. For interviews, explain the three properties, acknowledge that partitions are inevitable, and justify your CP or AP choice based on the business requirement — would stale inventory data cause a financial loss, or would a brief outage frustrate users more? That reasoning matters more than memorizing which database is which letter.

DatabasesSQL

Database Indexing Explained: B-Trees, Composite Indexes, and Query Optimization

A query that takes 4 seconds without an index takes 0.2ms with one. That's a 20,000x improvement from a single line of S…

Read

ObservabilityOpenTelemetry

Observability: Logs, Metrics, and Traces

When a production system breaks at 3am, you need to answer three questions fast: what happened, how bad is it, and where…

Read

ScalingSystem Design

Horizontal vs Vertical Scaling

Scaling is how a system handles more users, data, or traffic. Vertical scaling (scale up) means giving your existing ser…

Read