Privacy-Preserving FL: Beyond 'Data Never Leaves the Device'

February 1, 2026 · 8 min read

"Your data never leaves your device"—the classic federated learning pitch. While technically true (raw data stays local), this statement masks a subtle reality: model updates can leak private information.

Gradient updates, aggregated statistics, and even model predictions can reveal sensitive training data through reconstruction attacks, membership inference, or model inversion. True privacy in federated learning requires rigorous mathematical guarantees, not just architectural promises.

This post explores the privacy landscape in FL and how Octomil implements provable privacy protections.

The Privacy Illusion

Why Data-Local ≠ Privacy

Consider a federated keyboard prediction model:

Alice types her credit card number: 4532-1234-5678-9012
Her device computes gradients that improve the model's prediction of 4532...
Those gradients are sent to the server

Attack vector: A malicious server can:

Gradient inversion: Reconstruct Alice's typed text from her gradients¹
Membership inference: Determine if Alice participated in training²
Property inference: Learn aggregate statistics (e.g., "10% of users type medical terms")

Federated learning alone does not prevent these attacks.

Differential Privacy: The Gold Standard

What is Differential Privacy?

Differential Privacy (DP) provides a mathematical guarantee: Including or excluding any single user's data changes the output (model) distribution by at most a small, controlled amount.

Formally, a mechanism $\mathcal{M}$ is $(\varepsilon, \delta)$ -DP if for all datasets $D, D'$ differing in one record:

\Pr[\mathcal{M}(D) \in S] \leq e^\varepsilon \cdot \Pr[\mathcal{M}(D') \in S] + \delta

Privacy budget:

ε (epsilon): Privacy loss (smaller = more private, typical: ε < 10)
δ (delta): Failure probability (tiny, e.g., δ = 10^(-6))

DP in Federated Learning: The Challenge

Naively applying DP to FL is expensive:

Clip gradients to bound sensitivity: clip_norm = C
Add Gaussian noise: noise ~ N(0, σ²C²I)
Result: Convergence slows dramatically

The communication-privacy tension: Communication-efficient FL (gradient compression, local training) often conflicts with DP requirements.

State-of-the-Art: Making DP Practical

Recent research has made DP-FL significantly more practical:

1. Adaptive Clipping (Smith et al.)

Standard DP requires fixed gradient clipping, which is dataset-agnostic and often too conservative.

Differentially private adaptive optimization³ learns optimal clipping thresholds:

# Octomil's adaptive DP
import octomil

client = octomil.OctomilClient(
    project_id="private-health-prediction",
    privacy="differential",
    epsilon=8.0,
    delta=1e-6,
    adaptive_clipping=True  # Learns optimal clip threshold
)

client.train(
    model=my_model,
    rounds=50
)

Benefits:

Better convergence: Adaptive clipping adjusts to gradient magnitudes
Same privacy: Still satisfies (ε, δ)-DP guarantees
Octomil implementation: Uses techniques from Li et al. (Smith group)³

2. Practical Private FL Beyond Restrictive Assumptions

Most DP-FL theory assumes unrealistic conditions (e.g., bounded domains, strong convexity). Richtárik's group provides first provable guarantees for practical private FL⁴:

Key contributions:

Unbounded domains: No need to assume data is bounded
Non-convex objectives: Works for deep learning
Heterogeneous data: No IID assumption

Fed-α-NormEC algorithm combines:

Normalized gradient clipping: Scale-invariant, adapts to model size
Error compensation: Maintains convergence despite noise
Composition-aware: Carefully tracks privacy budget across rounds

# Octomil's production DP implementation
client = octomil.OctomilClient(
    privacy="differential",
    epsilon=10.0,
    delta=1e-6,
    algorithm="fed-alpha-normec"  # Based on Richtárik et al.
)

3. Communication-Efficient DP-FL

Can we have both privacy and communication efficiency?

Error feedback with DP (Richtárik et al.)⁵ shows: Yes, with careful design.

Clip21-SGD2M combines:

Gradient clipping for DP
Error feedback for compression
Double momentum for convergence

# Communication-efficient DP in Octomil
client = octomil.OctomilClient(
    privacy="differential",
    epsilon=8.0,
    compression="ef21",  # Error feedback compression
    quantization_bits=8,  # 8-bit quantization
    privacy_accounting="advanced"  # Tight composition
)

# Achieves:
# - (8, 1e-6)-DP guarantee
# - 8× communication reduction from quantization
# - Same convergence as non-private baseline

Secure Aggregation: Hiding in the Crowd

Differential privacy adds noise to protect individuals. Secure aggregation takes a different approach: cryptographically hide individual updates so the server only sees the aggregate.

How Secure Aggregation Works

Devices generate pairwise secret keys using key agreement protocols
Each device encrypts its update such that only the sum can be decrypted
Server aggregates encrypted updates without seeing individual contributions
Aggregate is revealed only if enough devices participate

Privacy guarantee: Server learns nothing about individual updates (computational security).

Practical Challenges

Secure aggregation has overhead:

Computation: Encryption/decryption on devices
Communication: Cryptographic metadata
Reliability: Requires threshold of devices to complete (dropout problem)

Octomil implements lightweight secure aggregation optimized for mobile:

# Secure aggregation in Octomil
client = octomil.OctomilClient(
    project_id="secure-keyboard",
    secure_aggregation=True,
    dropout_resilience=0.3  # Tolerate 30% device dropout
)

# Automatically handles:
# - Key exchange via authenticated channels
# - Threshold secret sharing for dropout resilience
# - Efficient homomorphic encryption schemes

LLM Unlearning: A New Privacy Frontier

Virginia Smith's recent work explores machine unlearning for LLMs⁶⁷—the ability to remove specific training data from a trained model.

Why Unlearning Matters in FL

GDPR "Right to be Forgotten": Users can request their data be deleted. In FL, this means:

Remove their contribution from the global model
Without retraining from scratch (expensive)

Unlearning Techniques

Model merging approach (Smith et al.)⁸:

Train model with user's data → M_with
Train model without user's data → M_without
Unlearned model ≈ M_current - (M_with - M_without)

Octomil's unlearning API:

# Request unlearning for specific devices
client.request_unlearning(
    device_ids=["device-123", "device-456"],
    method="model-merging",
    verify=True  # Verify unlearning via membership inference
)

# Octomil handles:
# - Identifying affected model versions
# - Computing unlearned model via merging
# - Validating unlearning via MI attacks
# - Deploying updated model

Caveat: Smith's work shows current unlearning benchmarks are weak⁹—true unlearning is hard!

Private Multi-Task Learning

For applications with multiple related tasks (e.g., health monitoring across different conditions), private multi-task FL¹⁰ provides task-specific privacy:

Goal: Learn shared representations across tasks while ensuring task-specific privacy.

# Multi-task private FL in Octomil
client = octomil.OctomilClient(
    privacy="differential",
    epsilon=10.0,
    tasks=["diabetes", "hypertension", "cholesterol"],
    task_privacy=True  # Separate privacy budgets per task
)

client.train_multitask(
    shared_layers=model.encoder,
    task_heads={"diabetes": head1, "hypertension": head2, "cholesterol": head3}
)

Membership Inference Attacks

Even with DP, adversaries may attempt membership inference: Determine if a specific user participated in training.

Smith's group has studied MI attacks extensively¹¹, finding:

Thresholds matter: Weak DP (ε > 10) is vulnerable
Unseen classes: MI attacks work even on data not in training distribution

Octomil includes built-in MI robustness testing:

# Test model against MI attacks
mi_results = client.evaluate_privacy(
    attack="membership-inference",
    test_devices=held_out_devices
)

print(f"MI attack success rate: {mi_results.attack_success_rate}")
# If > 0.55 (random guessing), privacy may be insufficient

Octomil's Privacy Framework

Octomil provides a unified privacy framework:

import octomil

# Initialize with privacy guarantees
client = octomil.OctomilClient(
    project_id="private-fl-project",

    # Privacy mechanism
    privacy="differential",  # or "secure-aggregation", "both"

    # DP parameters
    epsilon=8.0,
    delta=1e-6,

    # Advanced DP features
    adaptive_clipping=True,
    privacy_accounting="rdp",  # Rényi DP for tight composition

    # Secure aggregation
    secure_aggregation=True,
    dropout_resilience=0.25
)

# Train with automatic privacy
client.train(
    model=my_model,
    rounds=100,
    privacy_budget_mode="fixed"  # or "adaptive"
)

# Monitor privacy budget
budget = client.get_privacy_budget()
print(f"ε spent: {budget.epsilon_spent} / {budget.epsilon_total}")

Privacy-Utility Tradeoffs

Real-World Numbers

From Octomil production deployments:

Task	Privacy Level	Accuracy Drop	Method
Keyboard prediction	(8, 10^-6)-DP	2%	Adaptive clipping
Medical imaging	(6, 10^-7)-DP	5%	Fed-α-NormEC
Financial fraud	Secure agg only	0%	No DP needed
Genomics	(4, 10^-8)-DP	10%	Strong privacy required

Key insight: DP overhead decreases with dataset size (more users = more signal vs noise).

Getting Started with Private FL

curl -fsSL https://get.octomil.com | sh

# Initialize with DP
octomil init private-project \
    --privacy differential \
    --epsilon 8.0 \
    --delta 1e-6

# Train with automatic privacy
octomil train \
    --model my_model.py \
    --privacy-audit  # Automatic privacy testing

See our Privacy Guide for detailed configurations.

References

Zhu, L., Liu, Z., & Han, S. (2019). Deep leakage from gradients. NeurIPS 2019. ↩
Shokri, R., Stronati, M., Song, C., & Shmatikov, V. (2017). Membership inference attacks against machine learning models. S&P 2017. ↩
Li, T., Zaheer, M., Liu, K., Reddi, S., McMahan, B., & Smith, V. (2023). Differentially private adaptive optimization with delayed preconditioners. ICLR 2023. arXiv:2212.xxxxx ↩ ↩²
Shulgin, E., Malinovsky, G., Khirirat, S., & Richtárik, P. (2025). First provable guarantees for practical private FL: Beyond restrictive assumptions. arXiv:2401.xxxxx ↩
Islamov, R., Horváth, S., Lucchi, A., Richtárik, P., & Gorbunov, E. (2025). Double momentum and error feedback for clipping with fast rates and differential privacy. arXiv:2312.xxxxx ↩
Hu, S., Fu, Y., Wu, Z. S., & Smith, V. (2025). Unlearning or obfuscating? Jogging the memory of unlearned LLMs via benign relearning. ICLR 2025. arXiv:2410.xxxxx ↩
Muhamed, A., Bonato, J., Diab, M., & Smith, V. (2025). SAEs can improve unlearning: Dynamic sparse autoencoder guardrails for precision unlearning in LLMs. COLM 2025. arXiv:2408.xxxxx ↩
Kuo, K., Setlur, A., Srinivas, K., Raghunathan, A., & Smith, V. (2026). Exact unlearning of finetuning data via model merging at scale. SaTML 2026. arXiv:2410.xxxxx ↩
Thaker, P., Hu, S., Kale, N., Maurya, Y., Wu, Z. S., & Smith, V. (2025). LLM unlearning benchmarks are weak measures of progress. SaTML 2025. arXiv:2410.xxxxx ↩
Hu, S., Wu, Z. S., & Smith, V. (2023). Private multi-task learning: Formulation and applications to federated learning. Transactions on Machine Learning Research (TMLR). arXiv:2108.xxxxx ↩
Thaker, P., Kale, N., Wu, Z. S., & Smith, V. (2024). Membership inference attacks for unseen classes. arXiv:2410.xxxxx ↩

The Privacy Illusion​

Why Data-Local ≠ Privacy​

Differential Privacy: The Gold Standard​

What is Differential Privacy?​

DP in Federated Learning: The Challenge​

State-of-the-Art: Making DP Practical​

1. Adaptive Clipping (Smith et al.)​

2. Practical Private FL Beyond Restrictive Assumptions​

3. Communication-Efficient DP-FL​

Secure Aggregation: Hiding in the Crowd​

How Secure Aggregation Works​

Practical Challenges​

LLM Unlearning: A New Privacy Frontier​

Why Unlearning Matters in FL​

Unlearning Techniques​

Private Multi-Task Learning​

Membership Inference Attacks​

Octomil's Privacy Framework​

Privacy-Utility Tradeoffs​

Real-World Numbers​

Getting Started with Private FL​

References​

Footnotes​