Consistent Hashing

Summary

Consistent hashing distributes keys across servers with minimal data movement when nodes join or leave.
Think of it as placing both servers and keys on a circular runway, then walking clockwise to find the next server.

the problem: modulo hashing breaks under change

Why modulo hashing fails

Mapping: hash(key) % num_nodes.
Adding/removing a node changes num_nodes → all mappings shift.
Result: massive data reshuffling, load spikes, downtime.

Warning

Modulo hashing causes O(n) key remapping for any change in cluster size.

Analogy: Like reassigning all customers to new counters every time you open or close a counter.

core idea: consistent hashing

The hash ring

Imagine a circular space (e.g., 0 to 2^32 − 1).
Hash servers → place points on the circle.
Hash keys → place on same circle.
To find target server: move clockwise until you hit the first server.

(mermaid diagram, indented)
```
  graph LR
      A[key hash] --> B((next server clockwise))
```

Why this works

Adding a node only affects keys between its predecessor and itself.
Removing a node only moves keys that mapped to that node.

Important

Only ~1/n of the keys get remapped on node join/leave.

node addition & removal

Adding a node

Only keys in its arc (prev → new) move.

Removing a node

Only keys in removed node’s arc move to the next clockwise server.

Analogy: Think of a ferris wheel—adding a new cabin doesn’t force everyone to change seats; only people in the sector near the new cabin adjust.

virtual nodes (v-nodes)

Why we need them

Without v-nodes:

Load depends on how “lucky” a server’s single hash position is.
Removing one node can overload its immediate clockwise neighbor.

How they help

Each server is hashed multiple times (DB1-v1, DB1-v2, …).
These points spread around the ring.
Removing a server spreads load across many remaining nodes.

Tip

More virtual nodes → smoother load distribution (typical: 100–200 per server).

(simple layout diagram)

    graph circular
        DB1v1 --- DB2v1 --- DB1v2 --- DB3v1

where consistent hashing shows up in the real world

System	Use
Cassandra	distributes data by token ranges on a ring
DynamoDB	uses a variant for partition placement
CDNs	pick the edge server for content storage
Distributed caches (e.g., memcached)	assign keys to cache nodes

Practical considerations

When to dive deep

Designing distributed databases, distributed caches, message brokers, or custom clusters.
Topics to cover:
- Why modulo hashing fails
- Hash ring & arc movements
- Virtual nodes for balancing
- Handling node failures & additions
- Avoiding hot spots
- Rebalancing strategies

When to keep it light

When using DynamoDB, Cassandra, etc.
→ simply mention: “It uses consistent hashing under the hood.”

Summary

Tip: Only dive deep when you’re building infra yourself. Otherwise, acknowledge and move on.

quick mental model

Consistent hashing = “circle + clockwise lookup”.
Virtual nodes = “many small slices instead of one big slice”.

Tech Notes

Explorer

Consistent Hashing

Consistent Hashing

the problem: modulo hashing breaks under change

Why modulo hashing fails

core idea: consistent hashing

The hash ring

Why this works

node addition & removal

Adding a node

Removing a node

virtual nodes (v-nodes)

Why we need them

How they help

where consistent hashing shows up in the real world

Practical considerations

When to dive deep

When to keep it light

quick mental model

Graph View

Table of Contents