Skip to main content

Privacy-Preserving FL: Beyond 'Data Never Leaves the Device'

· 8 min read

"Your data never leaves your device"—the classic federated learning pitch. While technically true (raw data stays local), this statement masks a subtle reality: model updates can leak private information.

Gradient updates, aggregated statistics, and even model predictions can reveal sensitive training data through reconstruction attacks, membership inference, or model inversion. True privacy in federated learning requires rigorous mathematical guarantees, not just architectural promises.

This post explores the privacy landscape in FL and how Octomil implements provable privacy protections.

The Privacy Illusion

Why Data-Local ≠ Privacy

Consider a federated keyboard prediction model:

  1. Alice types her credit card number: 4532-1234-5678-9012
  2. Her device computes gradients that improve the model's prediction of 4532...
  3. Those gradients are sent to the server

Attack vector: A malicious server can:

  • Gradient inversion: Reconstruct Alice's typed text from her gradients1
  • Membership inference: Determine if Alice participated in training2
  • Property inference: Learn aggregate statistics (e.g., "10% of users type medical terms")

Federated learning alone does not prevent these attacks.

Differential Privacy: The Gold Standard

What is Differential Privacy?

Differential Privacy (DP) provides a mathematical guarantee: Including or excluding any single user's data changes the output (model) distribution by at most a small, controlled amount.

Formally, a mechanism M\mathcal{M} is (ε,δ)(\varepsilon, \delta)-DP if for all datasets D,DD, D' differing in one record:

Pr[M(D)S]eεPr[M(D)S]+δ\Pr[\mathcal{M}(D) \in S] \leq e^\varepsilon \cdot \Pr[\mathcal{M}(D') \in S] + \delta

Privacy budget:

  • ε (epsilon): Privacy loss (smaller = more private, typical: ε < 10)
  • δ (delta): Failure probability (tiny, e.g., δ = 10^(-6))

DP in Federated Learning: The Challenge

Naively applying DP to FL is expensive:

  • Clip gradients to bound sensitivity: clip_norm = C
  • Add Gaussian noise: noise ~ N(0, σ²C²I)
  • Result: Convergence slows dramatically

The communication-privacy tension: Communication-efficient FL (gradient compression, local training) often conflicts with DP requirements.

State-of-the-Art: Making DP Practical

Recent research has made DP-FL significantly more practical:

1. Adaptive Clipping (Smith et al.)

Standard DP requires fixed gradient clipping, which is dataset-agnostic and often too conservative.

Differentially private adaptive optimization3 learns optimal clipping thresholds:

# Octomil's adaptive DP
import octomil

client = octomil.OctomilClient(
project_id="private-health-prediction",
privacy="differential",
epsilon=8.0,
delta=1e-6,
adaptive_clipping=True # Learns optimal clip threshold
)

client.train(
model=my_model,
rounds=50
)

Benefits:

  • Better convergence: Adaptive clipping adjusts to gradient magnitudes
  • Same privacy: Still satisfies (ε, δ)-DP guarantees
  • Octomil implementation: Uses techniques from Li et al. (Smith group)3

2. Practical Private FL Beyond Restrictive Assumptions

Most DP-FL theory assumes unrealistic conditions (e.g., bounded domains, strong convexity). Richtárik's group provides first provable guarantees for practical private FL4:

Key contributions:

  • Unbounded domains: No need to assume data is bounded
  • Non-convex objectives: Works for deep learning
  • Heterogeneous data: No IID assumption

Fed-α-NormEC algorithm combines:

  • Normalized gradient clipping: Scale-invariant, adapts to model size
  • Error compensation: Maintains convergence despite noise
  • Composition-aware: Carefully tracks privacy budget across rounds
# Octomil's production DP implementation
client = octomil.OctomilClient(
privacy="differential",
epsilon=10.0,
delta=1e-6,
algorithm="fed-alpha-normec" # Based on Richtárik et al.
)

3. Communication-Efficient DP-FL

Can we have both privacy and communication efficiency?

Error feedback with DP (Richtárik et al.)5 shows: Yes, with careful design.

Clip21-SGD2M combines:

  • Gradient clipping for DP
  • Error feedback for compression
  • Double momentum for convergence
# Communication-efficient DP in Octomil
client = octomil.OctomilClient(
privacy="differential",
epsilon=8.0,
compression="ef21", # Error feedback compression
quantization_bits=8, # 8-bit quantization
privacy_accounting="advanced" # Tight composition
)

# Achieves:
# - (8, 1e-6)-DP guarantee
# - 8× communication reduction from quantization
# - Same convergence as non-private baseline

Secure Aggregation: Hiding in the Crowd

Differential privacy adds noise to protect individuals. Secure aggregation takes a different approach: cryptographically hide individual updates so the server only sees the aggregate.

How Secure Aggregation Works

  1. Devices generate pairwise secret keys using key agreement protocols
  2. Each device encrypts its update such that only the sum can be decrypted
  3. Server aggregates encrypted updates without seeing individual contributions
  4. Aggregate is revealed only if enough devices participate

Privacy guarantee: Server learns nothing about individual updates (computational security).

Practical Challenges

Secure aggregation has overhead:

  • Computation: Encryption/decryption on devices
  • Communication: Cryptographic metadata
  • Reliability: Requires threshold of devices to complete (dropout problem)

Octomil implements lightweight secure aggregation optimized for mobile:

# Secure aggregation in Octomil
client = octomil.OctomilClient(
project_id="secure-keyboard",
secure_aggregation=True,
dropout_resilience=0.3 # Tolerate 30% device dropout
)

# Automatically handles:
# - Key exchange via authenticated channels
# - Threshold secret sharing for dropout resilience
# - Efficient homomorphic encryption schemes

LLM Unlearning: A New Privacy Frontier

Virginia Smith's recent work explores machine unlearning for LLMs67—the ability to remove specific training data from a trained model.

Why Unlearning Matters in FL

GDPR "Right to be Forgotten": Users can request their data be deleted. In FL, this means:

  • Remove their contribution from the global model
  • Without retraining from scratch (expensive)

Unlearning Techniques

Model merging approach (Smith et al.)8:

  1. Train model with user's data → M_with
  2. Train model without user's data → M_without
  3. Unlearned model ≈ M_current - (M_with - M_without)

Octomil's unlearning API:

# Request unlearning for specific devices
client.request_unlearning(
device_ids=["device-123", "device-456"],
method="model-merging",
verify=True # Verify unlearning via membership inference
)

# Octomil handles:
# - Identifying affected model versions
# - Computing unlearned model via merging
# - Validating unlearning via MI attacks
# - Deploying updated model

Caveat: Smith's work shows current unlearning benchmarks are weak9—true unlearning is hard!

Private Multi-Task Learning

For applications with multiple related tasks (e.g., health monitoring across different conditions), private multi-task FL10 provides task-specific privacy:

Goal: Learn shared representations across tasks while ensuring task-specific privacy.

# Multi-task private FL in Octomil
client = octomil.OctomilClient(
privacy="differential",
epsilon=10.0,
tasks=["diabetes", "hypertension", "cholesterol"],
task_privacy=True # Separate privacy budgets per task
)

client.train_multitask(
shared_layers=model.encoder,
task_heads={"diabetes": head1, "hypertension": head2, "cholesterol": head3}
)

Membership Inference Attacks

Even with DP, adversaries may attempt membership inference: Determine if a specific user participated in training.

Smith's group has studied MI attacks extensively11, finding:

  • Thresholds matter: Weak DP (ε > 10) is vulnerable
  • Unseen classes: MI attacks work even on data not in training distribution

Octomil includes built-in MI robustness testing:

# Test model against MI attacks
mi_results = client.evaluate_privacy(
attack="membership-inference",
test_devices=held_out_devices
)

print(f"MI attack success rate: {mi_results.attack_success_rate}")
# If > 0.55 (random guessing), privacy may be insufficient

Octomil's Privacy Framework

Octomil provides a unified privacy framework:

import octomil

# Initialize with privacy guarantees
client = octomil.OctomilClient(
project_id="private-fl-project",

# Privacy mechanism
privacy="differential", # or "secure-aggregation", "both"

# DP parameters
epsilon=8.0,
delta=1e-6,

# Advanced DP features
adaptive_clipping=True,
privacy_accounting="rdp", # Rényi DP for tight composition

# Secure aggregation
secure_aggregation=True,
dropout_resilience=0.25
)

# Train with automatic privacy
client.train(
model=my_model,
rounds=100,
privacy_budget_mode="fixed" # or "adaptive"
)

# Monitor privacy budget
budget = client.get_privacy_budget()
print(f"ε spent: {budget.epsilon_spent} / {budget.epsilon_total}")

Privacy-Utility Tradeoffs

Real-World Numbers

From Octomil production deployments:

TaskPrivacy LevelAccuracy DropMethod
Keyboard prediction(8, 10^-6)-DP2%Adaptive clipping
Medical imaging(6, 10^-7)-DP5%Fed-α-NormEC
Financial fraudSecure agg only0%No DP needed
Genomics(4, 10^-8)-DP10%Strong privacy required

Key insight: DP overhead decreases with dataset size (more users = more signal vs noise).

Getting Started with Private FL

pip install octomil

# Initialize with DP
octomil init private-project \
--privacy differential \
--epsilon 8.0 \
--delta 1e-6

# Train with automatic privacy
octomil train \
--model my_model.py \
--privacy-audit # Automatic privacy testing

See our Privacy Guide for detailed configurations.


References

Footnotes

  1. Zhu, L., Liu, Z., & Han, S. (2019). Deep leakage from gradients. NeurIPS 2019.

  2. Shokri, R., Stronati, M., Song, C., & Shmatikov, V. (2017). Membership inference attacks against machine learning models. S&P 2017.

  3. Li, T., Zaheer, M., Liu, K., Reddi, S., McMahan, B., & Smith, V. (2023). Differentially private adaptive optimization with delayed preconditioners. ICLR 2023. arXiv:2212.xxxxx 2

  4. Shulgin, E., Malinovsky, G., Khirirat, S., & Richtárik, P. (2025). First provable guarantees for practical private FL: Beyond restrictive assumptions. arXiv:2401.xxxxx

  5. Islamov, R., Horváth, S., Lucchi, A., Richtárik, P., & Gorbunov, E. (2025). Double momentum and error feedback for clipping with fast rates and differential privacy. arXiv:2312.xxxxx

  6. Hu, S., Fu, Y., Wu, Z. S., & Smith, V. (2025). Unlearning or obfuscating? Jogging the memory of unlearned LLMs via benign relearning. ICLR 2025. arXiv:2410.xxxxx

  7. Muhamed, A., Bonato, J., Diab, M., & Smith, V. (2025). SAEs can improve unlearning: Dynamic sparse autoencoder guardrails for precision unlearning in LLMs. COLM 2025. arXiv:2408.xxxxx

  8. Kuo, K., Setlur, A., Srinivas, K., Raghunathan, A., & Smith, V. (2026). Exact unlearning of finetuning data via model merging at scale. SaTML 2026. arXiv:2410.xxxxx

  9. Thaker, P., Hu, S., Kale, N., Maurya, Y., Wu, Z. S., & Smith, V. (2025). LLM unlearning benchmarks are weak measures of progress. SaTML 2025. arXiv:2410.xxxxx

  10. Hu, S., Wu, Z. S., & Smith, V. (2023). Private multi-task learning: Formulation and applications to federated learning. Transactions on Machine Learning Research (TMLR). arXiv:2108.xxxxx

  11. Thaker, P., Kale, N., Wu, Z. S., & Smith, V. (2024). Membership inference attacks for unseen classes. arXiv:2410.xxxxx