Coupang Engineering: Replacing Database Sequences with DynamoDB-Based Distributed Counter Service
Available in: 中文
Coupang's engineering team has shared a detailed account of building a distributed sequence service using DynamoDB and dual-layer caching to replace over 10,000 database sequences across 100+ servi...
Coupang's engineering team has shared a detailed account of building a distributed sequence service using DynamoDB and dual-layer caching to replace over 10,000 database sequences across 100+ services during a migration from relational databases to NoSQL.
The Challenge
- Over 100 teams depended on native database sequences for primary keys
- Some teams required sequences for sorting, others for downstream system compatibility
- Nearly 10,000 different counters across the organization
- DynamoDB doesn't support native sequences
- UUIDs would break existing sorting logic
- Snowflake IDs would introduce unwanted operational complexity
Requirements
- Unique and monotonically increasing values
- High availability (Tier-0 service)
- Low latency (sub-millisecond on hot path)
- Zero network calls on hot path (local cache generation)
- Backward compatible (start values, custom steps, ascending/descending)
- Zero-downtime migration capability
Architecture: Three-Layer Caching
Client Local Cache -> Server Cache Layer -> DynamoDB (source of truth)
- Client cache: Generates sequences locally without any network roundtrip
- Server cache: Pre-fetches 500-1000 values per batch
- DynamoDB: Atomic conditional updates using SET val = val + blockSize WHERE val = expectedValue
- Result: Less than 0.1% of requests reach DynamoDB
Key Design Decisions
- Accept gaps: Unused sequences lost on server crash is acceptable
- Batch fetching: One DynamoDB write supports hundreds of cache-level requests
- No distributed locks: Conditional updates provide conflict-free uniqueness
- Simplicity over sophistication: Avoided consensus protocols, vector clocks, distributed locks
Results
- Order team migrated 12 services in 3 weeks with zero downtime
- Less than 50 lines of code changed per service
- Sub-millisecond latency for cached sequence generation
- Complete backward compatibility with original database sequences
Design Philosophy
"Complex systems fail in complex ways. Every layer of coordination adds latency, more failure modes, and heavier operational burden." -- Coupang Engineering
This case study is a valuable reference for any organization migrating from relational databases to NoSQL and dealing with the sequence/ID generation challenge at scale.
← Previous: Google Gemini Gains Interactive Simulations: From Text Answers to Hands-On ExperimentsNext: Samsung Quietly Raises Galaxy Z Fold 7 Prices by $80 Amid Global Memory Chip Shortage →
0