Basic Terms and Concepts in Differential Privacy: A Summary of Chapter 2 from Dwork and Roth

Understanding differential privacy begins with a solid grasp of its core terms and definitions. Chapter 2 (pp. 11–27) of the foundational text “The Algorithmic Foundations of Differential Privacy” by Cynthia Dwork and Aaron Roth lays the groundwork for formalizing and analyzing differentially private algorithms.

In this article, we summarize the basic terms and constructs introduced in that chapter to help students, engineers, and privacy professionals develop a strong conceptual foundation in differential privacy.


1. Adjacent Databases

At the heart of differential privacy is the concept of adjacency:

Two datasets DDD and D′D’D′ are considered adjacent if they differ in only one individual’s data.

This definition reflects the idea that an individual’s presence or absence should not significantly influence the result of any analysis. It’s this idea that underpins the privacy guarantee of differential privacy.


2. Randomized Algorithms

Differential privacy relies on randomized algorithms to inject controlled randomness into outputs. This randomness ensures that outputs from adjacent datasets are statistically similar, obscuring any individual’s data.

  • A randomized algorithm is one that produces different outputs on the same input with some probability distribution.
  • In DP, these algorithms are designed so that the distribution over outputs doesn’t vary much between adjacent datasets.

3. ε-Differential Privacy

This is the formal definition of the differential privacy guarantee:

A randomized algorithm A\mathcal{A}A is ε-differentially private if for all adjacent datasets DDD and D′D’D′, and for all outputs S⊆Range(A)S \subseteq \text{Range}(\mathcal{A})S⊆Range(A): Pr⁡[A(D)∈S]≤eε⋅Pr⁡[A(D′)∈S]\Pr[\mathcal{A}(D) \in S] \leq e^{\varepsilon} \cdot \Pr[\mathcal{A}(D’) \in S]Pr[A(D)∈S]≤eε⋅Pr[A(D′)∈S]

Where:

  • ε\varepsilonε (epsilon) is the privacy loss parameter.
  • Smaller ε\varepsilonε means stronger privacy.
  • The factor eεe^{\varepsilon}eε bounds how much the probabilities of outcomes can differ.

4. Privacy Loss and Composition

Privacy Loss:

This term refers to the potential information leakage from repeated queries or data analysis. It is quantified using the epsilon parameter.

Composition:

Differential privacy supports composition theorems, which allow combining multiple differentially private algorithms:

  • Basic Composition: Total privacy loss is additive.
  • Advanced Composition: Provides tighter bounds using statistical techniques.

This is essential for real-world systems that make multiple queries or computations on the same dataset.


5. Sensitivity of a Function

The sensitivity of a function measures how much its output can change when a single individual’s data is changed.

Formally, for a function f:D→Rkf: D \rightarrow \mathbb{R}^kf:D→Rk, the ℓ1\ell_1ℓ1​-sensitivity is: Δf=max⁡adjacent D,D′∥f(D)−f(D′)∥1\Delta f = \max_{\text{adjacent } D, D’} \|f(D) – f(D’)\|_1Δf=adjacent D,D′max​∥f(D)−f(D′)∥1​

Functions with lower sensitivity require less noise to achieve the same privacy guarantee. This is a key factor when designing efficient differentially private algorithms.


6. Laplace Mechanism

The Laplace Mechanism is a fundamental method for achieving differential privacy. It adds Laplace-distributed noise to a query result based on the function’s sensitivity and the desired privacy level.

For a function fff with sensitivity Δf\Delta fΔf, the Laplace mechanism releases: A(D)=f(D)+Lap(Δf/ε)\mathcal{A}(D) = f(D) + \text{Lap}(\Delta f / \varepsilon)A(D)=f(D)+Lap(Δf/ε)

This mechanism balances utility (accuracy of results) and privacy (level of protection) using mathematically provable guarantees.


7. Utility Trade-off

Differential privacy always involves a trade-off between accuracy (utility) and privacy. More noise ensures better privacy but reduces the usefulness of the output, especially for small datasets or high-sensitivity queries.

Designing algorithms involves carefully adjusting:

  • The type of queries
  • The sensitivity of functions
  • The level of added noise
  • The number of allowable queries

Conclusion

The foundational definitions in Chapter 2 of Dwork and Roth’s work provide a precise language and mathematical structure for reasoning about privacy. Concepts like adjacent datasets, sensitivity, randomized algorithms, and privacy loss are essential for anyone developing or evaluating differentially private systems.

Leave a Comment

Your email address will not be published. Required fields are marked *