๐ Advanced Hashing
Ready for system design interviews? Learn Consistent Hashing (used in DynamoDB!), Bloom Filters (used in Chrome!), and Rolling Hash (Rabin-Karp). These advanced techniques power real distributed systems!
๐ฏ System Design Essentials!
These appear in senior/staff interviews:
- โ Consistent Hashing - Load balancing, distributed caching
- โ Bloom Filters - Space-efficient membership testing
- โ Rolling Hash - Efficient pattern matching
1. Consistent Hashing
Distributed Systems Magic!
Problem: How to distribute keys across servers with minimal redistribution when servers are added/removed?
Solution: Map both keys and servers to a hash ring!
Traditional Hashing:
server = hash(key) % N
Problem: Adding/removing server redistributes ALL keys!
Consistent Hashing:
- Map servers to ring: hash(server) โ position
- Map keys to ring: hash(key) โ position
- Key goes to next server clockwise
Result: Only K/N keys redistributed (K = total keys, N = servers)
Used in: Memcached, Cassandra, DynamoDB, CDNs
See Consistent Hashing
2. Bloom Filters
Probabilistic Set Membership
Key Property: Can have false positives, but NO false negatives!
class BloomFilter:
def __init__(self, size, num_hashes):
self.bit_array = [0] * size
self.num_hashes = num_hashes
def add(self, item):
for i in range(self.num_hashes):
idx = hash(item, i) % size
self.bit_array[idx] = 1
def contains(self, item):
for i in range(self.num_hashes):
idx = hash(item, i) % size
if self.bit_array[idx] == 0:
return False # Definitely NOT in set
return True # MIGHT be in set
Trade-off: Space efficiency vs false positive rate
Used in: Chrome (malicious URLs), Medium (articles you've read), Bitcoin
3. Rolling Hash (Rabin-Karp)
Efficient Pattern Matching
Idea: Compute hash incrementally as window slides
def rabin_karp(text, pattern):
# Hash pattern
pattern_hash = hash(pattern)
# Hash first window
window_hash = hash(text[0:m])
for i in range(len(text) - m + 1):
if window_hash == pattern_hash:
if text[i:i+m] == pattern:
return i # Found!
# Roll hash: remove first, add next
window_hash = roll(window_hash, text[i], text[i+m])
Time: O(n + m) average (vs O(nm) brute force)
Used in: Plagiarism detection, DNA matching, rsync
The Complete Code
Click Run to execute your code
Real-World Applications
Where These Are Used
Consistent Hashing:
- Amazon DynamoDB - data partitioning
- Memcached - distributed caching
- Cassandra - node assignment
- CDNs - server selection
Bloom Filters:
- Google Chrome - malicious URL detection
- Medium - recommend unread articles
- Bitcoin - transaction verification
- Databases - avoid disk lookups
Rolling Hash:
- Plagiarism detection systems
- DNA sequence matching
- File deduplication
- rsync - file synchronization
Interview Tips
๐ก System Design Interview
- โ Consistent Hashing: Mention for distributed caching, load balancing
- โ Bloom Filters: Use when space is critical, false positives OK
- โ Rolling Hash: Pattern matching, substring search
- โ Know trade-offs: Space vs accuracy, consistency vs availability
Summary
Advanced Hashing Mastery
- Consistent Hashing: Distributed systems, minimal redistribution
- Bloom Filters: Space-efficient, probabilistic membership
- Rolling Hash: Efficient pattern matching O(n + m)
- Real-world: Used in DynamoDB, Chrome, rsync!
- Essential for system design interviews!
๐ Module 7: Hashing - 100% COMPLETE!
You've mastered all of hashing:
- โ Hash Tables (collision resolution)
- โ Hash Maps & Sets (interview patterns)
- โ Advanced Hashing (system design)
You're ready for ANY hashing question! ๐
Enjoying these tutorials?