What Kafka Is

Apache Kafka is a distributed event streaming platform used as:

  • a message queue (async processing)
  • a stream (real-time continuous data)

Summary

Kafka is a distributed append-only log built for high throughput, scalability, and durability.

Analogy: Kafka is like a shared ledger where events are written once and readers independently replay it at their own speed.


Motivating Example (Why Kafka Exists)

World Cup live stats:

  • goals, fouls, substitutions → events
  • events must be ordered per match
  • massive traffic spikes
  • consumers must scale horizontally

Problems Kafka solves:

  • ordering
  • scalability
  • fault tolerance

Core Kafka Concepts

Cluster & brokers

  • cluster = multiple brokers
  • broker = a server storing data & serving clients

Important

More brokers = more storage + more throughput.


Topics vs partitions

  • topic: logical stream of messages
  • partition: physical, ordered, immutable log
conceptpurpose
topicdata organization
partitionscale + ordering
brokerhosts partitions

Important

Ordering is guaranteed only within a partition.


Producers & consumers

  • producer → writes messages
  • consumer → reads messages
  • Kafka is data-agnostic (schema doesn’t matter)

Message anatomy

A Kafka message contains:

  • value (required)
  • key (optional, affects partitioning)
  • timestamp (optional)
  • headers (metadata)

Tip

The key is the most important design decision.


Partitioning Logic (Critical)

(flow) producer → hash(key) → partition → broker

Rules:

  • same key → same partition → ordered
  • no key → round-robin → no ordering

Summary

Partitioning controls ordering, parallelism, and hotspots.


Append-Only Log Model

Each partition:

  • append-only
  • immutable
  • sequential writes
  • offset-based reads

Benefits:

  • fast disk IO
  • easy replication
  • simple recovery

Important

Kafka is fast because it never updates data, only appends.


Offsets & Consumption

  • every message has an offset
  • consumers track progress via offsets
  • offsets are committed, not messages acknowledged

Important

Kafka tracks where you are, not what you processed.


Replication & Durability

Leader–follower model

  • each partition has:
    • 1 leader
    • N−1 followers
  • leader handles reads & writes
  • followers replicate passively

Summary

Replication protects against broker failure, not bad data.


Acknowledgements (acks)

  • acks=all → leader + all replicas confirm

Trade-off:

  • durability ↑
  • latency ↑

Replication factor

  • default = 3
  • survives broker failure
  • costs disk space
factordurabilitystorage
1lowlow
3highhigh

Consumer model

Kafka uses pull-based consumption:

  • consumers poll at their own pace
  • prevents slow consumers from being overwhelmed
  • enables batching

Analogy: consumers sip from a river, Kafka never force-feeds.


Consumer Groups

  • group = multiple consumers sharing work
  • one partition → one consumer per group

Important

Same message can be read by multiple consumer groups.


Consumer Failure Handling

Offset recovery

  • offsets committed after processing
  • restart → resume from last committed offset

Rebalancing

  • partitions redistributed across alive consumers

Warning

Commit offsets after critical work, or you risk data loss.


Kafka: Queue vs Stream

message queuestream
async taskscontinuous processing
usually one consumer groupmany consumer groups
pull + ackreplayable log

Summary

Kafka itself is the same — consumer behavior changes.


When to Use Kafka

Use kafka as a queue when:

  • async processing (youtube transcoding)
  • ordering required (ticketmaster queue)
  • producer & consumer must scale independently

Use kafka as a stream when:

  • real-time analytics (ad clicks)
  • fan-out to many consumers (fb live comments)

Scalability Basics

Message size

  • recommended < 1MB
  • Kafka ≠ blob storage

Correct pattern:

  • blobs → S3
  • Kafka → pointer (URL)

Warning

Storing large blobs in Kafka is an anti-pattern.


Broker capacity (rough)

  • ~1TB storage
  • ~1M msgs/sec (hardware dependent)

If below this → scaling discussion may not be needed.


Scaling Strategies

Add brokers

  • increases capacity
  • requires enough partitions

Important

Under-partitioned topics cannot use new brokers.


Partitioning strategy (most important)

  • partition = hash(key) % partitions
  • good key → even distribution
  • bad key → hot partitions

Handling Hot Partitions

Example: viral ad in click aggregator

Solutions:

  • no key → random distribution, no ordering
  • salting → adId + random
  • compound key → adId:userId or adId:region
  • backpressure → slow producer

Summary

Hot partitions are solved by breaking key cardinality.


Retries & Errors

Producer retries

  • automatic retries
  • enable idempotence to avoid duplicates

Important

Idempotent producers are mandatory with retries.


Consumer retries

Kafka has no built-in consumer retry.

Pattern: main topic → retry topic → DLQ

Tip

If retries are core, SQS may be simpler than Kafka.


Performance Optimizations

  • batching → fewer network calls
  • compression → less bandwidth
  • partition key → biggest performance lever

Summary

Partitioning beats hardware for Kafka performance.


Retention Policies

Messages deleted by:

  • retention.ms (default 7 days)
  • retention.bytes

Important

Messages are not deleted after consumption.


Key takeaways

  • “Kafka is always available, sometimes consistent”
  • “Ordering is guaranteed per partition”
  • “Offsets track progress, not acknowledgements”
  • “Partitioning is the core scaling decision”

Final Takeaway

Summary

Kafka is a distributed, append-only log optimized for ordered, durable, high-throughput event processing. Focus on partitioning, hot partitions, durability trade-offs, and consumer behavior.