Process of eliminating copies of the same data.
Applications
- Storage deduplication: When file/block copies are detected, have the copies point to the original to increase storage efficiency
- Data Model deduplication: Commonly seen in databases. Can be done using uniqueness constraints and “merge logic”.
- Request/Event deduplication: To prevent the same operation from occurring again-the same fee being charged twice.
- Stream/message deduplication: In an at-least once delivery system-have logic in place to ensure a repeated message isn’t actioned more than once.
Why its used
- Reduced bandwidth and storage costs
- Improved data consistency and correctness
- Avoids double-processing side effects
- Fees charged twice
- Double emails sent
- Inventory decremented twice
Mechanisms
- Caching with TTL
- Uniqueness constraints
- Idempotency keys
- Unique Identifiers/Audit tables
- Canonicalisation
- Hashing content to find identical data