Web Analytics

๐Ÿš€ Advanced Hashing

Advanced ~30 min read โญโญ System Design

Ready for system design interviews? Learn Consistent Hashing (used in DynamoDB!), Bloom Filters (used in Chrome!), and Rolling Hash (Rabin-Karp). These advanced techniques power real distributed systems!

๐ŸŽฏ System Design Essentials!

These appear in senior/staff interviews:

  • โœ… Consistent Hashing - Load balancing, distributed caching
  • โœ… Bloom Filters - Space-efficient membership testing
  • โœ… Rolling Hash - Efficient pattern matching

1. Consistent Hashing

Distributed Systems Magic!

Problem: How to distribute keys across servers with minimal redistribution when servers are added/removed?

Solution: Map both keys and servers to a hash ring!

Traditional Hashing:
server = hash(key) % N
Problem: Adding/removing server redistributes ALL keys!

Consistent Hashing:
- Map servers to ring: hash(server) โ†’ position
- Map keys to ring: hash(key) โ†’ position
- Key goes to next server clockwise
Result: Only K/N keys redistributed (K = total keys, N = servers)

Used in: Memcached, Cassandra, DynamoDB, CDNs

See Consistent Hashing

2. Bloom Filters

Probabilistic Set Membership

Key Property: Can have false positives, but NO false negatives!

class BloomFilter:
    def __init__(self, size, num_hashes):
        self.bit_array = [0] * size
        self.num_hashes = num_hashes
    
    def add(self, item):
        for i in range(self.num_hashes):
            idx = hash(item, i) % size
            self.bit_array[idx] = 1
    
    def contains(self, item):
        for i in range(self.num_hashes):
            idx = hash(item, i) % size
            if self.bit_array[idx] == 0:
                return False  # Definitely NOT in set
        return True  # MIGHT be in set

Trade-off: Space efficiency vs false positive rate

Used in: Chrome (malicious URLs), Medium (articles you've read), Bitcoin

3. Rolling Hash (Rabin-Karp)

Efficient Pattern Matching

Idea: Compute hash incrementally as window slides

def rabin_karp(text, pattern):
    # Hash pattern
    pattern_hash = hash(pattern)
    
    # Hash first window
    window_hash = hash(text[0:m])
    
    for i in range(len(text) - m + 1):
        if window_hash == pattern_hash:
            if text[i:i+m] == pattern:
                return i  # Found!
        
        # Roll hash: remove first, add next
        window_hash = roll(window_hash, text[i], text[i+m])

Time: O(n + m) average (vs O(nm) brute force)

Used in: Plagiarism detection, DNA matching, rsync

The Complete Code

Output
Click Run to execute your code

Real-World Applications

Where These Are Used

Consistent Hashing:
  • Amazon DynamoDB - data partitioning
  • Memcached - distributed caching
  • Cassandra - node assignment
  • CDNs - server selection
Bloom Filters:
  • Google Chrome - malicious URL detection
  • Medium - recommend unread articles
  • Bitcoin - transaction verification
  • Databases - avoid disk lookups
Rolling Hash:
  • Plagiarism detection systems
  • DNA sequence matching
  • File deduplication
  • rsync - file synchronization

Interview Tips

๐Ÿ’ก System Design Interview

  • โœ… Consistent Hashing: Mention for distributed caching, load balancing
  • โœ… Bloom Filters: Use when space is critical, false positives OK
  • โœ… Rolling Hash: Pattern matching, substring search
  • โœ… Know trade-offs: Space vs accuracy, consistency vs availability

Summary

Advanced Hashing Mastery

  1. Consistent Hashing: Distributed systems, minimal redistribution
  2. Bloom Filters: Space-efficient, probabilistic membership
  3. Rolling Hash: Efficient pattern matching O(n + m)
  4. Real-world: Used in DynamoDB, Chrome, rsync!
  5. Essential for system design interviews!

๐ŸŽ‰ Module 7: Hashing - 100% COMPLETE!

You've mastered all of hashing:

  • โœ… Hash Tables (collision resolution)
  • โœ… Hash Maps & Sets (interview patterns)
  • โœ… Advanced Hashing (system design)

You're ready for ANY hashing question! ๐Ÿš€