Free lesson · 1 of 98 in the full path

CAP Theorem & Consistency Models

30 min read

It's 2 AM. Your Phone Won't Stop Buzzing.

You're the on-call engineer at a fintech startup. Fifteen minutes ago, the fiber link between your Mumbai data center and the Delhi replica went dark. Some contractor hit a cable while digging for a metro expansion, a classic India infrastructure moment.

Now your Slack is on fire. The payments database in Delhi can't reach Mumbai. Users in North India are trying to check their balances, and your system has exactly two choices:

  1. Refuse to answer. Show users an error page ("Service temporarily unavailable"). Safe, but your CEO's phone is already ringing.
  2. Answer with what Delhi knows. Maybe it's 30 seconds stale. Maybe a ₹5,000 transfer happened in Mumbai that Delhi doesn't know about yet.

This isn't a textbook problem. This is a 2 AM decision that affects real money and real people. And the framework that helps you think through it clearly is called the CAP Theorem.

Why Should You Care?

Three reasons:

  1. Every system design interview asks about CAP, directly or indirectly. If you can't explain it with clarity and nuance, you're not getting that L5+ offer.
  2. Every architecture decision you'll ever make for a distributed system involves picking a side of CAP. Even if you don't call it that.
  3. Production outages happen because engineers didn't understand the trade-off their database was making on their behalf. Knowing CAP means fewer 2 AM phone calls.

🟢 The Simple Version (Start Here If You're New)

Let's forget computers for a moment. Think about something simpler.

The Chai Stall Analogy

Imagine you and your friend run two chai stalls, one in Connaught Place (Delhi) and one in Bandra (Mumbai). You share a single menu and pricing. Every evening, you call each other and sync up: "Hey, we added masala chai for ₹30 today."

This works great. Until one day, the phone lines between Delhi and Mumbai go down. A storm, whatever. You can't talk to each other.

Now a customer walks into your Delhi stall and asks: "Do you have masala chai? How much?"

You have two options:

Option A: Be honest about your uncertainty. "I'm sorry, I can't confirm the latest menu right now. Our Delhi-Mumbai line is down. Come back in 30 minutes?" → The customer gets no chai. But you didn't lie to them about the price.

Option B: Serve what you know. "Yeah, masala chai is ₹30!" (But what if your Mumbai friend changed the price to ₹35 this morning and you don't know yet?) → The customer gets their chai. But maybe at the wrong price.

That's it. That's the entire CAP theorem. The phone line going down is a Partition. Option A is choosing Consistency over Availability. Option B is choosing Availability over Consistency.

The Three Letters

impossible in distributed systems C Consistency A Availability P Partition Tolerance

CP MongoDB, HBase

AP Cassandra, DynamoDB

CA Single-node PostgreSQL

The CAP Triangle: you can only guarantee two properties simultaneously in a distributed system

  • Consistency (C): Every read gets the freshest data. If someone just wrote ₹500 to Node A, and you read from Node B one millisecond later, you MUST see ₹500. Not the old value. Not "maybe later." Now.

  • Availability (A): Every request gets a response (not an error). The system never says "come back later." It always gives you something, even if that something might be slightly stale.

  • Partition Tolerance (P): The system keeps working even when the network between nodes is broken. Messages get lost, delayed, or dropped, and the system handles it gracefully instead of collapsing.

The Punchline

You can't have all three simultaneously in a distributed system. Pick two. That's the theorem. Eric Brewer proposed it in 2000, Seth Gilbert and Nancy Lynch proved it in 2002.

The Real Trade-off: Why "P" Isn't Optional

Here's where most people get confused. They think you're choosing between three pairs: CA, CP, or AP. Like picking from a menu.

That's wrong.

Network partitions aren't something you "opt into." They're physics. Fiber cables get cut. Routers crash. AWS regions have outages. In India especially, between data centers in Mumbai and Chennai, you'll see network blips weekly, major partitions every few months.

The real choice isn't "pick 2 out of 3." It's: when a partition inevitably happens, do you sacrifice Consistency or Availability?

That's it. CP or AP. That's the only decision you actually make.

(What about CA? CA systems exist (your single-node PostgreSQL on one server is technically CA). But it's not distributed. The moment you add a second node in another city, you're in partition territory.)

CAP Theorem in Reality Partitions WILL happen. Choose C or A during one. Client Node A(value=1) Node B(value=2) NETWORKPARTITION Choose Consistency (CP)Node A rejects write/readbecause it cannot sync with B."Error: System Unavailable" Choose Availability (AP)Node B accepts write/read.Data gets out of sync."Success: (stale/diverged)" Banking picks Consistency. Social media likes pick Availability. You cannot have both when the network breaks.
CAP Theorem in reality: when the partition hits, Node A can reject the request (CP) or Node B can answer with possibly-stale data (AP)
100%
CAP Theorem in Reality Partitions WILL happen. Choose C or A during one. Client Node A(value=1) Node B(value=2) NETWORKPARTITION Choose Consistency (CP)Node A rejects write/readbecause it cannot sync with B."Error: System Unavailable" Choose Availability (AP)Node B accepts write/read.Data gets out of sync."Success: (stale/diverged)" Banking picks Consistency. Social media likes pick Availability. You cannot have both when the network breaks.

If you read casual tech blogs, they present CAP like a restaurant menu: "Want CA? Use a relational database! Want AP? Use Cassandra!" That's the menu illusion. You don't get to order off this menu, because you don't get to decline partitions. Construction workers dig up fiber. Switches drop packets. Routers reboot. The only item you actually choose is what happens while the network is broken.

The Banking Scenario

Let's make this concrete. You're building the backend for a UPI payments app.

Setup:

  • Primary database in Mumbai (Node 1)
  • Replica in Delhi (Node 2)
  • Network between them just died

A user in Delhi opens the app and checks their balance.

CP: Consistency Wins Mumbai Node Balance: ₹5,000 Delhi Node Balance: ₹4,500 (?) ✕ PARTITION 👤 Delhi User "What's my balance?" ❌ ERROR 503 "Service Unavailable" AP: Availability Wins Mumbai Node Balance: ₹5,000 Delhi Node Balance: ₹4,500 (?) ✕ PARTITION 👤 Delhi User "What's my balance?" ✓ 200 OK "Balance: ₹4,500" (stale!)

During a network partition: CP returns an error to protect data integrity; AP returns stale data to stay responsive

If you chose CP (Consistency): Delhi Node thinks: "I can't reach Mumbai. The user might have received ₹500 in Mumbai seconds ago. I genuinely don't know the correct balance. Returning a wrong number for a financial app is dangerous. I'll return an error."

Result: User sees "Service temporarily unavailable." Frustrating, but safe.

If you chose AP (Availability): Delhi Node thinks: "I can't reach Mumbai, but I won't crash. I'll return the last balance I synced, ₹4,500. It might be slightly stale, but at least the app works."

Result: User sees ₹4,500. The app feels responsive, but the data might be wrong.

Real World: Who Picks What?

Here's the thing that took me years to internalize: the right choice depends entirely on what you're building.

CP Systems: "I'd Rather Be Down Than Wrong"

System Why CP?
UPI / IMPS payments You cannot show a user "payment successful" if the money didn't actually move. A wrong answer means someone loses real money.
Stock trading (Zerodha) If the system shows ₹100/share but the real price is ₹105, someone just lost ₹5 per share × 10,000 shares.
Inventory for flash sales (Amazon) If you say "In Stock" when it's not, you just oversold. Now you have angry customers and a logistics nightmare.
Aadhaar authentication The biometric match must be definitive. A false positive = identity fraud.

AP Systems: "I'd Rather Be Stale Than Dead"

System Why AP?
Zomato restaurant reviews Missing a review posted 5 seconds ago doesn't hurt anyone. But "Error 500" means lost customers.
Instagram/Twitter feeds Showing a feed that's 10 seconds behind is invisible to users. But downtime loses ad revenue.
Hotstar live comments Nobody will notice if a comment appears 3 seconds late. But if the comment system crashes, people leave.
Swiggy restaurant menus If the price is wrong by ₹5, you fix it later. If the app won't load, the user orders from Zomato.

The Pattern

Money, inventory, identity → CP.
Content, social, analytics → AP.

Memorize that. It answers 80% of interview questions about CAP.

⚠️ One more thing: CAP applies at the data-store level, not your entire architecture. A well-designed system is AP for the user's timeline (slightly old tweets are fine) and strictly CP for billing (double-charging a card is not). You tune the choice per feature, per business requirement. Not once for the whole company.

🟡 Going Deeper: Consistency Models

Okay, you understand the binary CP vs AP choice during a partition. But here's what nobody tells juniors: consistency isn't binary even during normal operations.

There's a whole spectrum between "perfectly consistent" and "eventually consistent." Understanding this spectrum is what separates mid-level engineers from senior architects.

Strong Consistency

Every read sees the latest write. Period. No exceptions. If I write balance = 5000 at timestamp T, any read after T, from any node, anywhere in the world, returns 5000.

How it works: The write doesn't succeed until ALL replicas confirm they have the new value. This is slow, but correct.

Used by: Google Spanner (globally consistent across continents using atomic clocks (yes, actual hardware clocks synced via GPS).

The cost: High latency on writes. If your replicas are in Mumbai and Singapore, every write must wait for a round-trip to Singapore (~60ms) before confirming.

Eventual Consistency

If you stop writing, eventually all replicas converge to the same value. "Eventually" might mean 50 milliseconds or 5 seconds; the system doesn't guarantee when.

How it works: Writes succeed immediately on one node. Background replication pushes changes to other nodes asynchronously.

Used by: Cassandra, DynamoDB (default mode), DNS.

The cost: You can read stale data. Two users reading the same key from different nodes at the same time might see different values.

Causal Consistency

A middle ground. If event B was caused by event A, then anyone who sees B must also see A. But unrelated events can appear in any order.

Example: If you post a comment, and then someone replies to your comment, causal consistency guarantees that anyone who sees the reply also sees your original comment. But two unrelated posts might appear in different orders for different users.

Used by: MongoDB (with causal sessions), CockroachDB.

Read-Your-Writes Consistency

A practical guarantee: after you write something, YOUR subsequent reads will always see that write. Other users might not see it immediately, but you will.

Example: You update your profile photo on Instagram. If you refresh the page, YOU see the new photo. Your friend in another city might still see the old photo for a few seconds.

Used by: Most modern systems as a minimum guarantee. Often implemented via session stickiness.

// Demonstrating eventual consistency problems
class EventuallyConsistentCache {
 private nodes: Map<string, { value: number; timestamp: number }>[] = [
   new Map(), // Node Delhi
   new Map(), // Node Mumbai
 ];

 // Write goes to one node immediately
 async write(key: string, value: number, nodeIndex: number): Promise<void> {
   this.nodes[nodeIndex].set(key, { value, timestamp: Date.now() });
   // Background replication: not instant!
   setTimeout(() => {
     this.nodes[1 - nodeIndex].set(key, { value, timestamp: Date.now() });
   }, Math.random() * 2000); // 0-2 seconds delay
 }

 // Read from a specific node: might be stale!
 read(key: string, nodeIndex: number): number | undefined {
   return this.nodes[nodeIndex].get(key)?.value;
 }
}

// The bug: User writes to Delhi, reads from Mumbai immediately
const cache = new EventuallyConsistentCache();
await cache.write('balance', 5000, 0); // Write to Delhi
const result = cache.read('balance', 1); // Read from Mumbai
// result might be undefined! Replication hasn't finished!

🔴 Architect's War Room

If you've been building systems for years, you already know the basics. This section is for the stuff that comes up in real architecture reviews and production incidents.

PACELC: The Real Framework

CAP is from 2000. It's useful but incomplete. It only tells you what happens during a partition. But partitions are rare, maybe a few minutes per year for most systems. What about the other 99.99% of the time?

In 2012, Daniel Abadi proposed the PACELC theorem:

Partition → trade Availability vs Consistency
Else (normal operation) → trade Latency vs Consistency

That "Else" part is what actually matters day-to-day. During normal operations:

  • Want strong consistency? → Every write must synchronously replicate to N nodes before acknowledging. This adds latency proportional to your worst-performing replica.
  • Want low latency? → Acknowledge writes immediately, replicate asynchronously. Fast, but now you have a consistency gap.

Real examples of PACELC trade-offs:

Database During Partition (PA/PC) During Normal (EL/EC)
DynamoDB PA (stays available) EL (defaults to fast, eventual reads)
MongoDB PC (errors on minority) EC (waits for majority acknowledgment)
Cassandra PA (accepts all writes) EL (tunable per query)
Spanner PC (halts writes) EC (uses TrueTime for global consistency)

Tunable Consistency: The Practitioner's Tool

Most real databases don't force a single CAP position. They let you tune it per operation.

Cassandra's Quorum System:

In Cassandra, with a replication factor of 3 (data copied to 3 nodes), you choose consistency per query:

  • ONE: Write/read succeeds after 1 node responds. Fast. Possibly stale.
  • QUORUM: Write/read succeeds after 2 of 3 nodes respond. Majority agreement. Good balance.
  • ALL: Write/read succeeds only when all 3 nodes respond. Strongest consistency. Slowest. One dead node = entire query fails.

The rule: if W + R > N, you get strong consistency (where W = write consistency, R = read consistency, N = replication factor).

QUORUM write (W=2) + QUORUM read (R=2) → W + R = 4 > 3 = N → Consistent!
ONE write (W=1) + ONE read (R=1) → W + R = 2 < 3 = N → Might read stale data!

DynamoDB's Strongly Consistent Reads:

DynamoDB is AP by default (fast, eventually consistent reads). But on any individual GetItem call, you can flip a flag:

// Default: eventually consistent (fast, might be stale)
const item = await dynamodb.get({ 
 TableName: 'accounts', 
 Key: { userId: '12345' } 
}).promise();

// Override: strongly consistent (slower, always fresh)
const freshItem = await dynamodb.get({ 
 TableName: 'accounts', 
 Key: { userId: '12345' },
 ConsistentRead: true  // ← costs 2x RCU, higher latency
}).promise();

At a fintech I worked with, the pattern was: eventual reads for the dashboard (showing transaction history, 200ms stale is fine), strongly consistent reads for the payment flow (checking balance before debiting, must be exact). Same database, different consistency guarantees for different operations.

The Jepsen Reality Check

Here's something that'll make you uncomfortable: most databases don't actually deliver the consistency they promise.

Kyle Kingsbury's Jepsen project systematically tests databases under partition conditions. The findings are terrifying:

  • MongoDB 3.6 claimed linearizability but lost acknowledged writes during partitions
  • Cassandra with QUORUM still had stale reads under certain network timing conditions
  • Redis Sentinel could lose writes during failover (Redis itself admits this in docs)
  • CockroachDB passed Jepsen testing, one of the few

The takeaway for architects: Don't trust the marketing. If your system handles money, run chaos engineering tests. Simulate partitions in staging. Use tools like Toxiproxy or AWS Fault Injection Simulator.

Multi-Region: When Your Partition Is the Pacific Ocean

For Indian companies going global, the real question becomes: how do you serve users in Singapore, the US, and India from the same database?

Option 1: Single-region primary, global replicas (AP pattern)

  • Primary in Mumbai. Read replicas in Singapore and US.
  • Writes always go to Mumbai. Reads can hit local replicas (fast, but stale).
  • Used by: Most Indian startups going international initially.
  • Problem: Write latency for international users (~200ms to Mumbai from Singapore, ~300ms from US).

Option 2: Multi-region active-active (complex AP)

  • Accept writes in any region. Resolve conflicts asynchronously.
  • Used by: Cassandra, DynamoDB Global Tables.
  • Problem: Conflict resolution. What if Mumbai and Singapore both update the same user's balance at the same time?

Option 3: Global consensus (CP pattern)

  • Google Spanner approach. Uses GPS-synced atomic clocks (TrueTime) to achieve global strong consistency.
  • Used by: Google Spanner, CockroachDB.
  • Problem: Expensive. Write latency of ~150ms even in the best case (waiting for consensus across continents).

Most Indian startups I've seen start with Option 1 (simple, works for 90% of traffic), then move critical paths (payments, inventory) to Option 3 when they can afford it.

The Decision Matrix

Scenario Choose Database Examples Indian Example
Payment ledgers, banking CP PostgreSQL (primary), MongoDB, Spanner UPI / Razorpay ledger
E-commerce inventory (flash sale) CP PostgreSQL with locks, Redis with Lua Flipkart Big Billion Day
User-facing content & feeds AP Cassandra, DynamoDB, ElasticSearch Zomato reviews, Insta feed
Session storage AP Redis, Memcached Swiggy login sessions
Analytics & metrics AP ClickHouse, TimescaleDB Hotstar viewership metrics
Configuration & feature flags CP etcd, Consul, ZooKeeper LaunchDarkly / internal config
Chat message delivery AP (with delivery guarantees) Cassandra + Kafka WhatsApp message store
Ride matching (Uber/Ola) AP with short TTL Redis Geo, Cassandra Ola driver locations

Common Mistakes

1. "We need strong consistency everywhere." No, you don't. I've seen teams put their entire application behind synchronous replication (the analytics dashboard, the user profile page, the notification bell) and then wonder why their P99 latency is 800ms. Be surgical. Strong consistency only for the data paths where incorrectness causes real harm.

2. "Eventual consistency means data loss." Eventual consistency doesn't mean data disappears. It means there's a window where different nodes have different versions. The data always converges. It's not data loss; it's data delay.

3. "CAP says we can only have 2, so our system is always broken." CAP trade-offs only apply during partitions. 99.9% of the time, your network is fine and you can have both consistency and availability. The theorem describes the degraded state, not the normal state.

4. "MongoDB is CP, so it's always consistent." MongoDB is CP by default configuration. But you can configure it to be AP-ish with writeConcern: { w: 1 } (don't wait for replicas). The classification is about default behavior, not about absolute capability.

5. "We'll just use Kafka as a database. Problem solved." Kafka is an AP commit log. It guarantees ordering and durability, but it's not a substitute for a database when you need strong consistency for point lookups. I've seen teams try to make Kafka their source of truth for account balances. It doesn't end well.

🧠 Key Takeaways

  • CAP = during a network partition, choose Consistency or Availability. That's the entire theorem in one sentence.
  • P is not optional. Networks fail. The real question is always: "When the network breaks, do we return errors or stale data?"
  • Money → CP. Content → AP. This heuristic is right 80% of the time.
  • PACELC is the better framework. It also covers normal operations: Latency vs Consistency.
  • Most databases are tunable. You're not locked into one mode. Configure consistency per-query based on the operation's risk profile.
  • Test your assumptions. Jepsen showed that many databases don't deliver what they promise. Run chaos tests.

Think About It

  1. UPI processes 10+ billion transactions per month. If you were designing UPI's database layer from scratch, would you use one CP database for everything? Or would you split the "payment execution" path (CP) from the "transaction history" path (AP)? How would you keep them in sync?

  2. Hotstar streams cricket to 25+ million concurrent viewers. The comment section shows live reactions. Would you make the comment system CP or AP? What if someone reports a comment as abusive? Should the "is this comment hidden?" flag be CP or AP?

  3. You're building a flash sale system for Myntra. 50,000 units of a product, 2 million users clicking "Buy Now" simultaneously. How do you prevent overselling (which requires CP-like guarantees) without making the system so slow that users get timeout errors?

Further Reading

Quiz available inside the full course after you request access.