Privacy-Preserving FL: Beyond 'Data Never Leaves the Device'
"Your data never leaves your device"—the classic federated learning pitch. While technically true (raw data stays local), this statement masks a subtle reality: model updates can leak private information.
Gradient updates, aggregated statistics, and even model predictions can reveal sensitive training data through reconstruction attacks, membership inference, or model inversion. True privacy in federated learning requires rigorous mathematical guarantees, not just architectural promises.
This post explores the privacy landscape in FL and how Octomil implements provable privacy protections.
The Privacy Illusion
Why Data-Local ≠ Privacy
Consider a federated keyboard prediction model:
- Alice types her credit card number:
4532-1234-5678-9012 - Her device computes gradients that improve the model's prediction of
4532... - Those gradients are sent to the server
Attack vector: A malicious server can:
- Gradient inversion: Reconstruct Alice's typed text from her gradients1
- Membership inference: Determine if Alice participated in training2
- Property inference: Learn aggregate statistics (e.g., "10% of users type medical terms")
Federated learning alone does not prevent these attacks.
Differential Privacy: The Gold Standard
What is Differential Privacy?
Differential Privacy (DP) provides a mathematical guarantee: Including or excluding any single user's data changes the output (model) distribution by at most a small, controlled amount.
Formally, a mechanism is -DP if for all datasets differing in one record:
Privacy budget:
- ε (epsilon): Privacy loss (smaller = more private, typical: ε < 10)
- δ (delta): Failure probability (tiny, e.g., δ = 10^(-6))
DP in Federated Learning: The Challenge
Naively applying DP to FL is expensive:
- Clip gradients to bound sensitivity:
clip_norm = C - Add Gaussian noise:
noise ~ N(0, σ²C²I) - Result: Convergence slows dramatically
The communication-privacy tension: Communication-efficient FL (gradient compression, local training) often conflicts with DP requirements.
State-of-the-Art: Making DP Practical
Recent research has made DP-FL significantly more practical:
1. Adaptive Clipping (Smith et al.)
Standard DP requires fixed gradient clipping, which is dataset-agnostic and often too conservative.
Differentially private adaptive optimization3 learns optimal clipping thresholds:
# Octomil's adaptive DP
import octomil
client = octomil.OctomilClient(
project_id="private-health-prediction",
privacy="differential",
epsilon=8.0,
delta=1e-6,
adaptive_clipping=True # Learns optimal clip threshold
)
client.train(
model=my_model,
rounds=50
)
Benefits:
- Better convergence: Adaptive clipping adjusts to gradient magnitudes
- Same privacy: Still satisfies (ε, δ)-DP guarantees
- Octomil implementation: Uses techniques from Li et al. (Smith group)3
2. Practical Private FL Beyond Restrictive Assumptions
Most DP-FL theory assumes unrealistic conditions (e.g., bounded domains, strong convexity). Richtárik's group provides first provable guarantees for practical private FL4:
Key contributions:
- Unbounded domains: No need to assume data is bounded
- Non-convex objectives: Works for deep learning
- Heterogeneous data: No IID assumption
Fed-α-NormEC algorithm combines:
- Normalized gradient clipping: Scale-invariant, adapts to model size
- Error compensation: Maintains convergence despite noise
- Composition-aware: Carefully tracks privacy budget across rounds
# Octomil's production DP implementation
client = octomil.OctomilClient(
privacy="differential",
epsilon=10.0,
delta=1e-6,
algorithm="fed-alpha-normec" # Based on Richtárik et al.
)
3. Communication-Efficient DP-FL
Can we have both privacy and communication efficiency?
Error feedback with DP (Richtárik et al.)5 shows: Yes, with careful design.
Clip21-SGD2M combines:
- Gradient clipping for DP
- Error feedback for compression
- Double momentum for convergence
# Communication-efficient DP in Octomil
client = octomil.OctomilClient(
privacy="differential",
epsilon=8.0,
compression="ef21", # Error feedback compression
quantization_bits=8, # 8-bit quantization
privacy_accounting="advanced" # Tight composition
)
# Achieves:
# - (8, 1e-6)-DP guarantee
# - 8× communication reduction from quantization
# - Same convergence as non-private baseline
Secure Aggregation: Hiding in the Crowd
Differential privacy adds noise to protect individuals. Secure aggregation takes a different approach: cryptographically hide individual updates so the server only sees the aggregate.
How Secure Aggregation Works
- Devices generate pairwise secret keys using key agreement protocols
- Each device encrypts its update such that only the sum can be decrypted
- Server aggregates encrypted updates without seeing individual contributions
- Aggregate is revealed only if enough devices participate
Privacy guarantee: Server learns nothing about individual updates (computational security).
Practical Challenges
Secure aggregation has overhead:
- Computation: Encryption/decryption on devices
- Communication: Cryptographic metadata
- Reliability: Requires threshold of devices to complete (dropout problem)
Octomil implements lightweight secure aggregation optimized for mobile:
# Secure aggregation in Octomil
client = octomil.OctomilClient(
project_id="secure-keyboard",
secure_aggregation=True,
dropout_resilience=0.3 # Tolerate 30% device dropout
)
# Automatically handles:
# - Key exchange via authenticated channels
# - Threshold secret sharing for dropout resilience
# - Efficient homomorphic encryption schemes
LLM Unlearning: A New Privacy Frontier
Virginia Smith's recent work explores machine unlearning for LLMs67—the ability to remove specific training data from a trained model.
Why Unlearning Matters in FL
GDPR "Right to be Forgotten": Users can request their data be deleted. In FL, this means:
- Remove their contribution from the global model
- Without retraining from scratch (expensive)
Unlearning Techniques
Model merging approach (Smith et al.)8:
- Train model with user's data → M_with
- Train model without user's data → M_without
- Unlearned model ≈ M_current - (M_with - M_without)
Octomil's unlearning API:
# Request unlearning for specific devices
client.request_unlearning(
device_ids=["device-123", "device-456"],
method="model-merging",
verify=True # Verify unlearning via membership inference
)
# Octomil handles:
# - Identifying affected model versions
# - Computing unlearned model via merging
# - Validating unlearning via MI attacks
# - Deploying updated model
Caveat: Smith's work shows current unlearning benchmarks are weak9—true unlearning is hard!
Private Multi-Task Learning
For applications with multiple related tasks (e.g., health monitoring across different conditions), private multi-task FL10 provides task-specific privacy:
Goal: Learn shared representations across tasks while ensuring task-specific privacy.
# Multi-task private FL in Octomil
client = octomil.OctomilClient(
privacy="differential",
epsilon=10.0,
tasks=["diabetes", "hypertension", "cholesterol"],
task_privacy=True # Separate privacy budgets per task
)
client.train_multitask(
shared_layers=model.encoder,
task_heads={"diabetes": head1, "hypertension": head2, "cholesterol": head3}
)
Membership Inference Attacks
Even with DP, adversaries may attempt membership inference: Determine if a specific user participated in training.
Smith's group has studied MI attacks extensively11, finding:
- Thresholds matter: Weak DP (ε > 10) is vulnerable
- Unseen classes: MI attacks work even on data not in training distribution
Octomil includes built-in MI robustness testing:
# Test model against MI attacks
mi_results = client.evaluate_privacy(
attack="membership-inference",
test_devices=held_out_devices
)
print(f"MI attack success rate: {mi_results.attack_success_rate}")
# If > 0.55 (random guessing), privacy may be insufficient
Octomil's Privacy Framework
Octomil provides a unified privacy framework:
import octomil
# Initialize with privacy guarantees
client = octomil.OctomilClient(
project_id="private-fl-project",
# Privacy mechanism
privacy="differential", # or "secure-aggregation", "both"
# DP parameters
epsilon=8.0,
delta=1e-6,
# Advanced DP features
adaptive_clipping=True,
privacy_accounting="rdp", # Rényi DP for tight composition
# Secure aggregation
secure_aggregation=True,
dropout_resilience=0.25
)
# Train with automatic privacy
client.train(
model=my_model,
rounds=100,
privacy_budget_mode="fixed" # or "adaptive"
)
# Monitor privacy budget
budget = client.get_privacy_budget()
print(f"ε spent: {budget.epsilon_spent} / {budget.epsilon_total}")
Privacy-Utility Tradeoffs
Real-World Numbers
From Octomil production deployments:
| Task | Privacy Level | Accuracy Drop | Method |
|---|---|---|---|
| Keyboard prediction | (8, 10^-6)-DP | 2% | Adaptive clipping |
| Medical imaging | (6, 10^-7)-DP | 5% | Fed-α-NormEC |
| Financial fraud | Secure agg only | 0% | No DP needed |
| Genomics | (4, 10^-8)-DP | 10% | Strong privacy required |
Key insight: DP overhead decreases with dataset size (more users = more signal vs noise).
Getting Started with Private FL
pip install octomil
# Initialize with DP
octomil init private-project \
--privacy differential \
--epsilon 8.0 \
--delta 1e-6
# Train with automatic privacy
octomil train \
--model my_model.py \
--privacy-audit # Automatic privacy testing
See our Privacy Guide for detailed configurations.