What Kafka Is
Apache Kafka is a distributed event streaming platform used as:
- A message queue (async processing)
- A stream (real-time continuous data)
Summary
Kafka is a distributed append-only log built for high throughput, scalability, and durability.
Analogy: Kafka is like a shared ledger where events are written once and readers independently replay it at their own speed.
Motivating Example (Why Kafka Exists)
World Cup live stats:
- Goals, fouls, substitutions → events
- Events must be ordered per match
- Massive traffic spikes
- Consumers must scale horizontally
Problems Kafka solves:
- Ordering
- Scalability
- Fault tolerance
Core Kafka Concepts
Cluster & brokers
- cluster = multiple brokers
- broker = a server storing data & serving clients
Important
More brokers = more storage + more throughput.
Topics vs partitions
- topic: logical stream of messages
- partition: physical, ordered, immutable log
| concept | purpose |
|---|---|
| topic | data organization |
| partition | scale + ordering |
| broker | hosts partitions |
Important
Ordering is guaranteed only within a partition.
Producers & consumers
- producer → writes messages
- consumer → reads messages
- Kafka is data-agnostic (schema doesn’t matter)
Message anatomy
A Kafka message contains:
- value (required)
- Key (optional, affects partitioning)
- Timestamp (optional)
- Headers (metadata)
Tip
The key is the most important design decision.
Partitioning Logic (Critical)
(flow) producer → hash(key) → partition → broker
Rules:
- Same key → same partition → ordered
- No key → round-robin → no ordering
Ordering example (banking events):
- Same user performs
deposit(100)and thenwithdraw(50). - These events must be processed in the same order to keep balance correct.
- Use
userIdas the partition key so both events go to the same partition. - Kafka guarantees order within a partition, so per-user balance updates remain consistent.
Summary
Partitioning controls ordering, parallelism, and hotspots.
Append-Only Log Model
Each partition:
- Append-only
- Immutable
- Sequential writes
- Offset-based reads
Benefits:
- Fast disk IO
- Easy replication
- Simple recovery
Important
Kafka is fast because it never updates data, only appends.
Offsets & Consumption
- Every message has an offset
- Consumers track progress via offsets
- Offsets are committed, not messages acknowledged
Important
Kafka tracks where you are, not what you processed.
Replication & Durability
Leader–follower model
- Each partition has:
- 1 leader
- N−1 followers
- Leader handles reads & writes
- Followers replicate passively
Summary
Replication protects against broker failure, not bad data.
Acknowledgements (acks)
acks=all→ leader + all replicas confirm
Trade-off:
- Durability ↑
- Latency ↑
Replication factor
- Default = 3
- Survives broker failure
- Costs disk space
| factor | durability | storage |
|---|---|---|
| 1 | low | low |
| 3 | high | high |
Consumer model
Kafka uses pull-based consumption:
- Consumers poll at their own pace
- Prevents slow consumers from being overwhelmed
- Enables batching
Analogy: consumers sip from a river, Kafka never force-feeds.
Consumer Groups
- Group = multiple consumers sharing work
- One partition → one consumer per group
Important
Same message can be read by multiple consumer groups.
Consumer Failure Handling
Offset recovery
- Offsets committed after processing
- Restart → resume from last committed offset
Rebalancing
- Partitions redistributed across alive consumers
Warning
Commit offsets after critical work, or you risk data loss.
Kafka: Queue vs Stream
| message queue | stream |
|---|---|
| async tasks | continuous processing |
| usually one consumer group | many consumer groups |
| pull + ack | replayable log |
Summary
Kafka itself is the same — consumer behavior changes.
When to Use Kafka
Use kafka as a queue when:
- Async processing (youtube transcoding)
- Ordering required (ticketmaster queue)
- Producer & consumer must scale independently
Use kafka as a stream when:
- Real-time analytics (ad clicks)
- Fan-out to many consumers (fb live comments)
Scalability Basics
Message size
- Recommended < 1MB
- Kafka ≠ blob storage
Correct pattern:
- Blobs → S3
- Kafka → pointer (URL)
Warning
Storing large blobs in Kafka is an anti-pattern.
Broker capacity (rough)
- ~1TB storage
- ~1M msgs/sec (hardware dependent)
If below this → scaling discussion may not be needed.
Scaling Strategies
Add brokers
- Increases capacity
- Requires enough partitions
Important
Under-partitioned topics cannot use new brokers.
Partitioning strategy (most important)
- Partition = hash(key) % partitions
- Good key → even distribution
- Bad key → hot partitions
Handling Hot Partitions
Example: viral ad in click aggregator
Solutions:
- No key → random distribution, no ordering
- Salting → adId + random
- Compound key → adId:userId or adId:region
- Backpressure → temporarily slow producers when lag and retries rise
Summary
Hot partitions are solved by breaking key cardinality.
Retries & Errors
Producer retries
- Automatic retries
- Enable idempotence to avoid duplicates
Example (payments):
- Risky event design:
increaseBalanceBy(+1 INR) - Safer event design:
setBalanceTo(50 INR)with a unique operation ID
If producer sends the message, times out, and retries:
increaseBalanceBy(+1 INR)may be applied twice (wrong balance)setBalanceTo(50 INR)applied twice still ends at 50 INR (same final state)
This is why idempotency matters: retries are expected, duplicates are possible, and business state must stay correct.
Important
Idempotent producers are mandatory with retries.
Consumer retries
Kafka has no built-in consumer retry.
Pattern: main topic → retry topic → DLQ
Important behavior:
- Messages do not go to DLQ immediately on first failure.
- They are retried up to a configured max retries count (often with delay/backoff).
- Only after retries are exhausted are they moved to the DLQ.
Tip
If retries are core, SQS may be simpler than Kafka.
Performance Optimizations
- Batching → fewer network calls
- Compression → less bandwidth
- Partition key → biggest performance lever
Summary
Partitioning beats hardware for Kafka performance.
Retention Policies
Messages deleted by:
- Retention.ms (default 7 days)
- Retention.bytes
Important
Messages are not deleted after consumption.
Key takeaways
- “Kafka is always available, sometimes consistent”
- “Ordering is guaranteed per partition”
- “Offsets track progress, not acknowledgements”
- “Partitioning is the core scaling decision”
Final Takeaway
Summary
Kafka is a distributed, append-only log optimized for ordered, durable, high-throughput event processing. Focus on partitioning, hot partitions, durability trade-offs, and consumer behavior.