Apache Kafka

What Kafka Is

Apache Kafka is a distributed event streaming platform used as:

A message queue (async processing)
A stream (real-time continuous data)

Summary

Kafka is a distributed append-only log built for high throughput, scalability, and durability.

Analogy: Kafka is like a shared ledger where events are written once and readers independently replay it at their own speed.

Motivating Example (Why Kafka Exists)

World Cup live stats:

Goals, fouls, substitutions → events
Events must be ordered per match
Massive traffic spikes
Consumers must scale horizontally

Problems Kafka solves:

Ordering
Scalability
Fault tolerance

Core Kafka Concepts

Cluster & brokers

cluster = multiple brokers
broker = a server storing data & serving clients

Important

More brokers = more storage + more throughput.

Topics vs partitions

topic: logical stream of messages
partition: physical, ordered, immutable log

concept	purpose
topic	data organization
partition	scale + ordering
broker	hosts partitions

Important

Ordering is guaranteed only within a partition.

Producers & consumers

producer → writes messages
consumer → reads messages
Kafka is data-agnostic (schema doesn’t matter)

Message anatomy

A Kafka message contains:

value (required)
Key (optional, affects partitioning)
Timestamp (optional)
Headers (metadata)

Tip

The key is the most important design decision.

Partitioning Logic (Critical)

(flow) producer → hash(key) → partition → broker

Rules:

Same key → same partition → ordered
No key → round-robin → no ordering

Ordering example (banking events):

Same user performs deposit(100) and then withdraw(50).
These events must be processed in the same order to keep balance correct.
Use userId as the partition key so both events go to the same partition.
Kafka guarantees order within a partition, so per-user balance updates remain consistent.

Summary

Partitioning controls ordering, parallelism, and hotspots.

Append-Only Log Model

Each partition:

Append-only
Immutable
Sequential writes
Offset-based reads

Benefits:

Fast disk IO
Easy replication
Simple recovery

Important

Kafka is fast because it never updates data, only appends.

Offsets & Consumption

Every message has an offset
Consumers track progress via offsets
Offsets are committed, not messages acknowledged

Important

Kafka tracks where you are, not what you processed.

Replication & Durability

Leader–follower model

Each partition has:
- 1 leader
- N−1 followers
Leader handles reads & writes
Followers replicate passively

Summary

Replication protects against broker failure, not bad data.

Acknowledgements (acks)

acks=all → leader + all replicas confirm

Trade-off:

Durability ↑
Latency ↑

Replication factor

Default = 3
Survives broker failure
Costs disk space

factor	durability	storage
1	low	low
3	high	high

Consumer model

Kafka uses pull-based consumption:

Consumers poll at their own pace
Prevents slow consumers from being overwhelmed
Enables batching

Analogy: consumers sip from a river, Kafka never force-feeds.

Consumer Groups

Group = multiple consumers sharing work
One partition → one consumer per group

Important

Same message can be read by multiple consumer groups.

Consumer Failure Handling

Offset recovery

Offsets committed after processing
Restart → resume from last committed offset

Rebalancing

Partitions redistributed across alive consumers

Warning

Commit offsets after critical work, or you risk data loss.

Kafka: Queue vs Stream

message queue	stream
async tasks	continuous processing
usually one consumer group	many consumer groups
pull + ack	replayable log

Summary

Kafka itself is the same — consumer behavior changes.

When to Use Kafka

Use kafka as a queue when:

Async processing (youtube transcoding)
Ordering required (ticketmaster queue)
Producer & consumer must scale independently

Use kafka as a stream when:

Real-time analytics (ad clicks)
Fan-out to many consumers (fb live comments)

Scalability Basics

Message size

Recommended < 1MB
Kafka ≠ blob storage

Correct pattern:

Blobs → S3
Kafka → pointer (URL)

Warning

Storing large blobs in Kafka is an anti-pattern.

Broker capacity (rough)

~1TB storage
~1M msgs/sec (hardware dependent)

If below this → scaling discussion may not be needed.

Scaling Strategies

Add brokers

Increases capacity
Requires enough partitions

Important

Under-partitioned topics cannot use new brokers.

Partitioning strategy (most important)

Partition = hash(key) % partitions
Good key → even distribution
Bad key → hot partitions

Handling Hot Partitions

Example: viral ad in click aggregator

Solutions:

No key → random distribution, no ordering
Salting → adId + random
Compound key → adId:userId or adId:region
Backpressure → temporarily slow producers when lag and retries rise

Summary

Hot partitions are solved by breaking key cardinality.

Retries & Errors

Producer retries

Automatic retries
Enable idempotence to avoid duplicates

Example (payments):

Risky event design: increaseBalanceBy(+1 INR)
Safer event design: setBalanceTo(50 INR) with a unique operation ID

If producer sends the message, times out, and retries:

increaseBalanceBy(+1 INR) may be applied twice (wrong balance)
setBalanceTo(50 INR) applied twice still ends at 50 INR (same final state)

This is why idempotency matters: retries are expected, duplicates are possible, and business state must stay correct.

Important

Idempotent producers are mandatory with retries.

Consumer retries

Kafka has no built-in consumer retry.

Pattern: main topic → retry topic → DLQ

Important behavior:

Messages do not go to DLQ immediately on first failure.
They are retried up to a configured max retries count (often with delay/backoff).
Only after retries are exhausted are they moved to the DLQ.

Tip

If retries are core, SQS may be simpler than Kafka.

Performance Optimizations

Batching → fewer network calls
Compression → less bandwidth
Partition key → biggest performance lever

Summary

Partitioning beats hardware for Kafka performance.

Retention Policies

Messages deleted by:

Retention.ms (default 7 days)
Retention.bytes

Important

Messages are not deleted after consumption.

Key takeaways

“Kafka is always available, sometimes consistent”
“Ordering is guaranteed per partition”
“Offsets track progress, not acknowledgements”
“Partitioning is the core scaling decision”

Final Takeaway

Summary

Kafka is a distributed, append-only log optimized for ordered, durable, high-throughput event processing. Focus on partitioning, hot partitions, durability trade-offs, and consumer behavior.

Tech Notes

Explorer

Apache Kafka

What Kafka Is

Motivating Example (Why Kafka Exists)

Core Kafka Concepts

Cluster & brokers

Topics vs partitions

Producers & consumers

Message anatomy

Partitioning Logic (Critical)

Append-Only Log Model

Offsets & Consumption

Replication & Durability

Leader–follower model

Acknowledgements (acks)

Replication factor

Consumer model

Consumer Groups

Consumer Failure Handling

Offset recovery

Rebalancing

Kafka: Queue vs Stream

When to Use Kafka

Use kafka as a queue when:

Use kafka as a stream when:

Scalability Basics

Message size

Broker capacity (rough)

Scaling Strategies

Add brokers

Partitioning strategy (most important)

Handling Hot Partitions

Retries & Errors

Producer retries

Consumer retries

Performance Optimizations

Retention Policies

Key takeaways

Final Takeaway

Graph View

Table of Contents

Backlinks