Free lesson · 1 of 98 in the full path
The 30 Must-Know Concepts
30 min read
The Meeting Where Everyone Spoke a Secret Language
Your first architecture review at the new job. You have written solid backend code for six years, so you walk in confident.
Eleven minutes in, you have heard: "the consumer group rebalances and we lose ordering", "that read is eventually consistent so the replica might lie to you", "just put a Bloom filter in front of it", "no, Redlock won't survive a GC pause".
You understood every individual English word. You understood none of the sentences. And the worst part: when the lead asked for your thoughts, you said the code looked clean. The code. In an architecture review.
Here is the secret nobody tells you. The people in that room are not smarter than you. They just have a map. Every scary term slots into a small number of territories, and once you know the territories, a new term stops being terrifying. It is just a new village in a region you already know.
The biggest mistake you can make right now is trying to learn these concepts one by one, in order, like reading a dictionary from A to Z. If you try to deeply understand Kafka before you know what a basic message queue is, you will drown. This lesson hands you the map first. Depth comes later. That is literally what the next 19 weeks are for.
Why Should You Care?
- Interviews test connections, not definitions. "Why does your caching choice affect your consistency story?" is a map question. Candidates with 30 isolated flashcards fail it. Candidates with a connected map answer it casually.
- Every later lesson lands softer. When you reach Week 6 and meet cache stampedes, your brain will already have a shelf labeled "caching problems" to put it on. Pre-built shelves are how fast learners learn fast.
- You stop being bluffable. Vendors, blog posts, and overconfident teammates throw jargon as authority. With the map, you can ask the one question that matters: which problem, in which territory, does this actually solve?
🟢 The Simple Version: The Map Has Four Territories
Think of the entire field as the Indian Railways network. One of the largest, most chaotic, yet remarkably functional systems in the world. Every system design concept you will ever meet does one of four jobs in that network.
Territory 1: Networking (the tracks and signals)
How data physically travels from A to B. If the tracks are broken, nothing else matters. In the railway, this is the steel tracks, the signals, and the routing that stops two trains from colliding.
Lives here: DNS (the phonebook turning swiggy.com into an IP), TCP vs UDP (guaranteed delivery vs throw-and-hope), load balancers (the traffic cops), CDNs (local warehouses so Mumbai users don't fetch files from Virginia), and proxies (who is being hidden, the client or the server?).
Territory 2: APIs & Communication (the languages and timetables)
If networking is the tracks, this territory is the station masters and the ticketing rules. Who boards, when, speaking which language.
Lives here: REST, GraphQL, gRPC (three dialects for asking services for things), API gateways and rate limiting (the bouncers at the door), webhooks, WebSockets and SSE (ways to push instead of poll), message queues (a waiting room where tasks sit until a worker is free), pub/sub (a broadcasting station, everyone interested listens), and idempotency (tapping "Pay" twice must not charge twice).
Territory 3: Databases & Storage (the vaults and freight yards)
The heaviest territory. Where data lives and how it survives growth. In the railway, these are the yards where trains are parked, maintained, and logged.
Lives here: the eternal SQL vs NoSQL debate (rigid reliable ledgers vs flexible scale), ACID (the promises a transaction makes), indexing with B-trees and LSM trees (how databases find things fast), replication (copies for safety and read scale), sharding (chopping data when one machine is not enough), consistent hashing (the elegant math of deciding which shard), and caching (keeping hot answers close because computing them again is expensive).
Territory 4: Distributed Systems (the coordination layer)
The art of making many machines act as one, and surviving the chaos when they disagree. In the railway, this is keeping the network running when the monsoon floods a track in Mumbai while a train derails outside Delhi.
Lives here: the CAP theorem (the forced choice when the network breaks), consistency models (how stale is acceptable?), consensus and Raft (electing a leader when half the cluster is dead), distributed locks (scarier than they look), sagas and distributed transactions (multi-step operations across services), and Bloom filters with their probabilistic cousins (fast approximate answers at planetary scale).
And two strips on the map's edge
- Scale patterns: replication, partitioning, fan-out. Recurring moves that show up in every territory.
- Operations: deployments, observability, circuit breakers. How you ship and run all of the above without losing your weekends.
🟡 Going Deeper: One Request Touches Half the Map
The territories are not isolated islands. Every real request stitches them together. Trace one Swiggy order and watch the map light up:
- Your phone asks DNS where
api.swiggy.comlives (Networking) - The request hits a load balancer, which picks a healthy server (Networking)
- The API gateway checks your auth token and your rate limit (APIs)
- The order service handles the REST call (APIs)
- The menu price comes from Redis cache, one database read avoided (Databases)
- The order row is written to a sharded Postgres inside a transaction. ACID earning its keep (Databases)
- An "order placed" event lands on a queue for the restaurant app, notifications, and analytics. Three consumers, so this is pub/sub (APIs/Distributed)
- Payment runs with an idempotency key, because Jio networks make people double-tap (Distributed)
- If the payment gateway hangs, a circuit breaker fails fast instead of dragging the whole site down (Operations)
Nine concepts, one biryani. This is why the map matters. In an interview, "design Swiggy" really means: walk this path and defend each stop.
The 30, As One Table
Your skim-first checklist. One line each. Definitions now, depth in the coming weeks.
| # | Concept | One-liner | Territory |
|---|---|---|---|
| 1 | DNS | Name to IP phonebook, cached everywhere with TTLs | Networking |
| 2 | TCP vs UDP | Guaranteed-ordered vs fast-and-lossy delivery | Networking |
| 3 | Load balancing | Spread traffic; detect and route around dead servers | Networking |
| 4 | CDN | Cache content near users; physics sets the latency floor | Networking |
| 5 | Proxy / reverse proxy | Hide the client / hide the servers | Networking |
| 6 | REST | Resources + verbs + status codes over HTTP | APIs |
| 7 | GraphQL | Client picks the response shape; server pays for it | APIs |
| 8 | gRPC | Binary, contract-first, internal service-to-service speed | APIs |
| 9 | API gateway | One front door: auth, routing, rate limits | APIs |
| 10 | Rate limiting | Token bucket / sliding window; protect the backend | APIs |
| 11 | AuthN vs AuthZ | Who are you vs what may you do; sessions vs JWT | APIs |
| 12 | Message queue | Task waiting room; one consumer wins each task | APIs |
| 13 | Pub/sub | Broadcast; every subscriber gets the event | APIs |
| 14 | WebSockets / SSE / polling | Three ways to get live updates, three cost profiles | APIs |
| 15 | Idempotency | Same request twice = same effect once | APIs |
| 16 | SQL vs NoSQL | Joins and transactions vs flexible horizontal scale | Databases |
| 17 | ACID | Atomic, consistent, isolated, durable. The transaction promise | Databases |
| 18 | Indexing | Pre-sorted lookup structures; every index taxes writes | Databases |
| 19 | B-tree vs LSM | Read-optimized vs write-optimized storage engines | Databases |
| 20 | Replication | Copies of data; lag is the price | Databases |
| 21 | Sharding | Split data across machines; shard key choice is destiny | Databases |
| 22 | Consistent hashing | Add or remove a node, move only 1/N of the keys | Databases |
| 23 | Caching | Keep hot answers close; invalidation is the hard part | Databases |
| 24 | CAP theorem | During a partition: consistency or availability, pick one | Distributed |
| 25 | Consistency models | Strong, causal, read-your-writes, eventual | Distributed |
| 26 | Consensus (Raft) | Many machines agreeing on one truth, despite failures | Distributed |
| 27 | Distributed locks & transactions | Coordination across machines; sagas over 2PC | Distributed |
| 28 | Bloom filters & friends | Tiny memory, approximate answers, false positives allowed | Distributed |
| 29 | Observability | Logs, metrics, traces. Seeing inside the running system | Operations |
| 30 | Resilience patterns | Circuit breakers, retries, bulkheads. Failing gracefully | Operations |
Do not memorize this table. Bookmark it. Each row is a future lesson. Today's job is only this: none of these names should feel alien anymore.
🔴 Architect's Corner: How to Actually Use the Map
Concepts Cluster Into Recurring Arguments
Senior engineers do not think in 30 separate concepts. They think in a handful of arguments that repeat across territories:
- The freshness argument. Caching, replication lag, consistency models, CDN TTLs. All of them are the same fight: how stale is acceptable, in exchange for what gain? Learn it once and you will recognize it everywhere.
- The coordination argument. Distributed locks, consensus, transactions, idempotency. All of them ask: how do machines agree? And the recurring senior answer is to avoid needing agreement at all. Idempotency over locks. Sagas over 2PC.
- The capacity argument. Load balancing, sharding, consistent hashing, queues. The work does not fit one machine, so split it, then manage what the split breaks.
- The blast-radius argument. Circuit breakers, bulkheads, rate limits, dead letter queues. When something fails (not if, when), how far does the damage spread?
When an interviewer pushes you with "okay, now what if the cache dies?", they are testing whether you see the freshness and blast-radius arguments. Not whether you memorized Redis commands.
Learn in Priority Tiers, Not Alphabetical Order
- High (learn cold, weeks 1 to 8): load balancing, caching, SQL vs NoSQL, sharding, replication, CAP, consistency models, consistent hashing, queues vs pub/sub, idempotency, REST. These appear in every interview and every real design.
- Medium (working knowledge, weeks 9 to 14): gRPC and GraphQL, rate limiting, auth flows, B-tree vs LSM, consensus, sagas, Bloom filters, observability, deployment patterns. You should explain these confidently. You do not need to implement them at 2 AM.
- Low (one-paragraph knowledge, on demand): vector clock internals, Paxos proofs, erasure coding, CRDTs, 3PC. Know the one-line use case and when to dig deeper. Finishing the path beats perfecting chapter 3.
Going deep on everything is how people quit in week 3. The tier list is a pacing strategy, not a ranking of importance.
The Map Is Also a Diagnosis Tool
Production incident triage is mostly territory identification:
- "Site slow for everyone" → start in Networking (LB health, DNS, CDN) before blaming code.
- "Slow for exactly one big customer" → Databases (hot shard? an index that does not fit their data shape?).
- "Payments occasionally duplicated" → Distributed (idempotency, retries, queue redelivery).
- "Fast in staging, dies at 6 PM" → the capacity argument (pool exhaustion, cache hit rate collapsing under real traffic).
Engineers without the map grep logs at random. Engineers with the map binary-search the territories.
Common Mistakes
1. "I'll learn each concept fully before moving to the next." Dictionary-style learning drowns you, because the concepts reference each other in a circle. Consistency needs replication, replication needs CAP, CAP needs consistency. Skim the full map first. Depth on the second pass. That is this entire path's design.
2. "More concepts = more senior."
Seniority is connecting them. A mid-level engineer name-drops Kafka. A senior explains why a Postgres jobs table beats Kafka for this team at this scale. Interviewers downgrade jargon that does not connect to constraints.
3. "I know the definition, so I know the concept." The 4-column test from this path's tracker: can you define it, explain how it works, say when it breaks, and argue the tradeoff? Definitions are column one of four. Most people stop there, and that is why most people freeze in round two.
4. "New tech means new territories." The map is stable. Products churn. Kafka, RabbitMQ, SQS and NATS are all queue/pub-sub villages in the same region. When the next hot tool launches, your first question is "which village does this replace?", and suddenly the hype is readable.
🧠 Key Takeaways
- Four territories hold every concept: Networking (data travels), APIs (services talk), Databases (data lives), Distributed Systems (machines agree). Plus scale patterns and operations at the edges.
- One real request touches half the map. Interviews exploit this. Trace a Swiggy order and defend each stop.
- Concepts cluster into recurring arguments: freshness, coordination, capacity, blast radius. Learn the argument once and recognize it in every costume.
- Learn in priority tiers, not alphabetically. High tier cold, medium tier conversational, low tier on demand.
- The map doubles as incident triage. Identify the territory before grepping logs.
- Do not memorize the 30 today. Make them un-alien today. The next 19 weeks add the depth.
Think About It
Trace a UPI payment (you scan a QR, money moves, both phones buzz) through the map the way this lesson traced the Swiggy order. Which territories does it touch that the food order did not? Which concept does the "both phones buzz" part depend on?
A teammate proposes adding GraphQL because "REST is old". Using the territories and the recurring arguments, what are the first three questions you would ask before agreeing? Which argument (freshness, coordination, capacity, blast radius) does GraphQL actually affect?
You are told a system is "eventually consistent". List the other concepts on the map that this single phrase quietly commits the system to: replication choices, cache behavior, what the client must tolerate. Then explain the user-visible consequence to a product manager in one sentence.
Further Reading
- Designing Data-Intensive Applications - Martin Kleppmann: the book this entire map compresses. Read it across the 20 weeks, not in one go
- System Design Primer (GitHub): the most popular open map of the same territory, good for cross-checking your mental model
- The Architecture of Open Source Applications: the map's concepts living inside real codebases
- High Scalability - Real-Life Architectures: case studies that exercise multiple territories per article
Quiz available inside the full course after you request access.