In the digital age, organizations handle massive volumes of personal data. Ensuring that this data is used responsibly and securely is both a legal and ethical requirement. ISO/IEC 20889, a privacy-focused standard published by the International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC), offers a comprehensive framework for data de-identification—the process of minimizing identifiability in data sets.
This article provides a focused review of ISO/IEC 20889, particularly sections covering terminology (pages 8–10) and classification of de-identification techniques (pages 25–33), and explains how this standard supports efforts to achieve anonymity and data protection.
What Is ISO/IEC 20889?
ISO/IEC 20889:2018, titled Privacy Enhancing Data De-Identification Terminology and Classification of Techniques, defines the core vocabulary and categorization of privacy-enhancing methods used to reduce the identifiability of personal data. It does not mandate specific technologies, but instead provides a standardized framework for evaluating and comparing de-identification techniques.
It is especially relevant in fields like data analytics, healthcare, marketing, and AI—anywhere anonymized or pseudonymized data is used.
Key Terminology from ISO/IEC 20889 (Pages 8–10)
Understanding de-identification starts with clear definitions. ISO/IEC 20889 defines key terms to support a consistent global vocabulary in privacy-related discussions:
✅ Personally Identifiable Information (PII)
Data that can be used to identify an individual directly or indirectly (e.g., name, ID number, location data).
✅ De-Identification
A process that removes or modifies PII to make it more difficult to link to an individual. This includes anonymization, pseudonymization, and masking.
✅ Anonymization
A form of de-identification in which the risk of re-identification is extremely low or negligible. True anonymization is difficult and often debated in practice.
✅ Pseudonymization
PII is replaced with fictitious identifiers (pseudonyms), but re-identification is still possible if the key linking pseudonyms to identities is accessible.
✅ Data Masking
Conceals PII by obfuscating values without structurally altering the data set (e.g., replacing characters with asterisks).
These foundational definitions ensure clarity when evaluating or implementing privacy measures across different industries and regions.
Classification of De-Identification Techniques (Pages 25–33)
ISO/IEC 20889 classifies de-identification techniques based on their purpose, methodology, and effectiveness in preventing re-identification. These are grouped into several categories:
1. Perturbative Techniques
These methods alter data values in a way that retains overall utility but masks individual entries.
- Noise addition (e.g., random values added to data)
- Data swapping (e.g., switching values between records)
- Microaggregation (e.g., averaging values in clusters)
2. Non-Perturbative Techniques
These approaches suppress or generalize data instead of modifying it.
- Suppression (e.g., removing sensitive values altogether)
- Generalization (e.g., replacing specific data like “age 27” with “20–30”)
3. Synthetic Data Generation
Synthetic data is artificially created to mimic real data patterns while containing no actual PII.
- Useful in machine learning and testing environments
- Carries lower re-identification risks if well-designed
4. Tokenization
Sensitive elements are replaced with unique, non-reversible tokens. Common in financial transactions and healthcare.
5. Masking Techniques
Applies transformation to hide specific details (e.g., hiding credit card digits). Simple but often inadequate alone for privacy protection.
Application of ISO/IEC 20889 in Practice
Organizations applying ISO/IEC 20889 can:
- Assess de-identification risks using a structured framework
- Choose appropriate methods for specific use cases (e.g., healthcare vs. e-commerce)
- Ensure alignment with privacy regulations like the GDPR, which promotes data minimization and privacy by design
- Develop privacy-preserving data analytics workflows
For example, a health data research organization might use pseudonymization and generalization to retain analytical value while protecting patient identity.
Related Reading: GDPR Compliance and Data Minimization
Limitations and Considerations
While ISO/IEC 20889 is not prescriptive, it guides the development of privacy-aware systems by:
- Encouraging risk-based decision-making for de-identification
- Clarifying that true anonymity is rarely achievable, and risks must be continuously assessed
- Recognizing the evolving threat landscape, including machine learning-based re-identification attacks
Final Thoughts
ISO/IEC 20889 is a vital resource for professionals responsible for data privacy, security architecture, and analytics governance. By standardizing terminology and techniques, it enables consistent, transparent, and legally defensible data de-identification practices.
Whether you’re building a privacy program, managing a data lake, or sharing sensitive datasets, adopting ISO/IEC 20889 as part of your strategy supports responsible innovation and global compliance.
We love to share our knowledge on current technologies. Our motto is ‘Do our best so that we can’t blame ourselves for anything“.