Understanding differential privacy begins with a solid grasp of its core terms and definitions. Chapter 2 (pp. 11–27) of the foundational text “The Algorithmic Foundations of Differential Privacy” by Cynthia Dwork and Aaron Roth lays the groundwork for formalizing and analyzing differentially private algorithms.
In this article, we summarize the basic terms and constructs introduced in that chapter to help students, engineers, and privacy professionals develop a strong conceptual foundation in differential privacy.
1. Adjacent Databases
At the heart of differential privacy is the concept of adjacency:
Two datasets DDD and D′D’D′ are considered adjacent if they differ in only one individual’s data.
This definition reflects the idea that an individual’s presence or absence should not significantly influence the result of any analysis. It’s this idea that underpins the privacy guarantee of differential privacy.
2. Randomized Algorithms
Differential privacy relies on randomized algorithms to inject controlled randomness into outputs. This randomness ensures that outputs from adjacent datasets are statistically similar, obscuring any individual’s data.
- A randomized algorithm is one that produces different outputs on the same input with some probability distribution.
- In DP, these algorithms are designed so that the distribution over outputs doesn’t vary much between adjacent datasets.
3. ε-Differential Privacy
This is the formal definition of the differential privacy guarantee:
A randomized algorithm A\mathcal{A}A is ε-differentially private if for all adjacent datasets DDD and D′D’D′, and for all outputs S⊆Range(A)S \subseteq \text{Range}(\mathcal{A})S⊆Range(A): Pr[A(D)∈S]≤eε⋅Pr[A(D′)∈S]\Pr[\mathcal{A}(D) \in S] \leq e^{\varepsilon} \cdot \Pr[\mathcal{A}(D’) \in S]Pr[A(D)∈S]≤eε⋅Pr[A(D′)∈S]
Where:
- ε\varepsilonε (epsilon) is the privacy loss parameter.
- Smaller ε\varepsilonε means stronger privacy.
- The factor eεe^{\varepsilon}eε bounds how much the probabilities of outcomes can differ.
4. Privacy Loss and Composition
Privacy Loss:
This term refers to the potential information leakage from repeated queries or data analysis. It is quantified using the epsilon parameter.
Composition:
Differential privacy supports composition theorems, which allow combining multiple differentially private algorithms:
- Basic Composition: Total privacy loss is additive.
- Advanced Composition: Provides tighter bounds using statistical techniques.
This is essential for real-world systems that make multiple queries or computations on the same dataset.
5. Sensitivity of a Function
The sensitivity of a function measures how much its output can change when a single individual’s data is changed.
Formally, for a function f:D→Rkf: D \rightarrow \mathbb{R}^kf:D→Rk, the ℓ1\ell_1ℓ1-sensitivity is: Δf=maxadjacent D,D′∥f(D)−f(D′)∥1\Delta f = \max_{\text{adjacent } D, D’} \|f(D) – f(D’)\|_1Δf=adjacent D,D′max∥f(D)−f(D′)∥1
Functions with lower sensitivity require less noise to achieve the same privacy guarantee. This is a key factor when designing efficient differentially private algorithms.
6. Laplace Mechanism
The Laplace Mechanism is a fundamental method for achieving differential privacy. It adds Laplace-distributed noise to a query result based on the function’s sensitivity and the desired privacy level.
For a function fff with sensitivity Δf\Delta fΔf, the Laplace mechanism releases: A(D)=f(D)+Lap(Δf/ε)\mathcal{A}(D) = f(D) + \text{Lap}(\Delta f / \varepsilon)A(D)=f(D)+Lap(Δf/ε)
This mechanism balances utility (accuracy of results) and privacy (level of protection) using mathematically provable guarantees.
7. Utility Trade-off
Differential privacy always involves a trade-off between accuracy (utility) and privacy. More noise ensures better privacy but reduces the usefulness of the output, especially for small datasets or high-sensitivity queries.
Designing algorithms involves carefully adjusting:
- The type of queries
- The sensitivity of functions
- The level of added noise
- The number of allowable queries
Conclusion
The foundational definitions in Chapter 2 of Dwork and Roth’s work provide a precise language and mathematical structure for reasoning about privacy. Concepts like adjacent datasets, sensitivity, randomized algorithms, and privacy loss are essential for anyone developing or evaluating differentially private systems.
We love to share our knowledge on current technologies. Our motto is ‘Do our best so that we can’t blame ourselves for anything“.