Process of eliminating copies of the same data.

Applications

  • Storage deduplication: When file/block copies are detected, have the copies point to the original to increase storage efficiency
  • Data Model deduplication: Commonly seen in databases. Can be done using uniqueness constraints and “merge logic”.
  • Request/Event deduplication: To prevent the same operation from occurring again-the same fee being charged twice.
  • Stream/message deduplication: In an at-least once delivery system-have logic in place to ensure a repeated message isn’t actioned more than once.

Why its used

  • Reduced bandwidth and storage costs
  • Improved data consistency and correctness
  • Avoids double-processing side effects
    • Fees charged twice
    • Double emails sent
    • Inventory decremented twice

Mechanisms

  • Caching with TTL
  • Uniqueness constraints
  • Idempotency keys
  • Unique Identifiers/Audit tables
  • Canonicalisation
  • Hashing content to find identical data