Privacy
Octomil privacy controls are designed for production use where sensitive user data must remain on-device.
Core privacy mechanisms
- Secure aggregation of client updates.
- Differential privacy noise controls.
- Operational controls for auditability and compliance.
Why privacy matters in federated learning
Federated learning already improves on centralized ML by keeping raw data on-device. But model updates themselves can leak information. Gradient inversion attacks can reconstruct training samples from raw gradients, and membership inference attacks can determine whether a specific record was in a client's training set. To defend against these threats, you need two additional layers: differential privacy (DP) and secure aggregation (SecAgg).
Octomil ships both as first-class features, not research add-ons. They integrate directly into the training round lifecycle documented in Training Rounds and are enforced through the privacy pipeline described below.
Differential privacy in Octomil
Differential privacy provides a mathematical guarantee that the output of an aggregation round does not depend too heavily on any single client's contribution. Octomil implements client-level DP, where noise is calibrated per-client update before aggregation.
Key parameters
| Parameter | Description | Guidance |
|---|---|---|
epsilon | Privacy budget. Lower = stronger privacy, noisier model. | Tune based on your threat model and accuracy requirements |
delta | Probability of privacy guarantee failure. | Should be very small relative to your dataset size |
noise_multiplier | Standard deviation of Gaussian noise relative to clipping norm. | Higher values increase privacy at the cost of accuracy |
max_grad_norm | L2 clipping bound applied to each client update. | Tune based on your model's gradient distribution |
Choosing epsilon
There is no universal "correct" epsilon. The right value depends on your threat model and acceptable accuracy loss. Lower epsilon provides stronger privacy but noisier models; higher epsilon provides weaker formal guarantees but better model accuracy.
Start with a conservative value and tighten as you measure accuracy impact on your specific model and data distribution. For highly sensitive data (health records, financial transactions), use a stricter epsilon. For less sensitive applications, a more relaxed value may be appropriate.
Configuration
Privacy settings are configured per-federation through the dashboard (Settings > Privacy) or via the REST API:
- cURL
- Python
- JavaScript
curl -X PUT https://api.octomil.com/api/v1/federations/health-risk-model/privacy \
-H "Authorization: Bearer edg_..." \
-H "Content-Type: application/json" \
-d '{
"differential_privacy": {
"enabled": true,
"epsilon": "<your-epsilon>",
"delta": "<your-delta>",
"noise_multiplier": "<your-noise-multiplier>",
"max_grad_norm": "<your-clip-norm>"
},
"secure_aggregation": {
"enabled": true,
"min_participating_clients": 10,
"reconstruction_threshold": "<configurable>"
}
}'
import requests
response = requests.put(
"https://api.octomil.com/api/v1/federations/health-risk-model/privacy",
headers={"Authorization": "Bearer edg_..."},
json={
"differential_privacy": {
"enabled": True,
"epsilon": ..., # Set based on your privacy requirements
"delta": ..., # Set relative to your dataset size
"noise_multiplier": ..., # Tune for your accuracy-privacy trade-off
"max_grad_norm": ..., # Tune based on your model's gradient distribution
},
"secure_aggregation": {
"enabled": True,
"min_participating_clients": 10,
"reconstruction_threshold": ..., # Configurable
},
},
)
print(response.json())
const response = await fetch("https://api.octomil.com/api/v1/federations/health-risk-model/privacy", {
method: "PUT",
headers: { "Authorization": "Bearer edg_...", "Content-Type": "application/json" },
body: JSON.stringify({
differential_privacy: {
enabled: true,
epsilon: /* your epsilon */,
delta: /* your delta */,
noise_multiplier: /* your noise multiplier */,
max_grad_norm: /* your clip norm */,
},
secure_aggregation: {
enabled: true,
min_participating_clients: 10,
reconstruction_threshold: /* configurable */,
},
}),
});
const data = await response.json();
console.log(data);
Once configured, privacy is enforced automatically during training:
from octomil import Federation
federation = Federation(api_key="edg_...", name="health-risk-model")
result = federation.train(
model="health-risk",
algorithm="fedavg",
rounds=10,
min_updates=50,
)
# Privacy filters (gradient clipping, noise injection) are applied
# server-side during aggregation — no client-side changes needed.
The noise_multiplier and max_grad_norm together determine the per-round epsilon spend. Octomil tracks cumulative epsilon across rounds and will pause training if the total budget is exhausted, preventing silent privacy degradation.
Privacy budget tracking
Octomil maintains a running privacy accountant per project. You can query remaining budget through the API or the Monitoring Dashboard:
- cURL
- Python
- JavaScript
curl https://api.octomil.com/api/v1/federations/health-risk-model/privacy/budget \
-H "Authorization: Bearer edg_..."
import requests
response = requests.get(
"https://api.octomil.com/api/v1/federations/health-risk-model/privacy/budget",
headers={"Authorization": "Bearer edg_..."},
)
print(response.json())
const response = await fetch("https://api.octomil.com/api/v1/federations/health-risk-model/privacy/budget", {
headers: { "Authorization": "Bearer edg_..." },
});
const data = await response.json();
console.log(data);
{
"epsilon_spent": 3.2,
"epsilon_remaining": 0.8,
"rounds_completed": 8,
"budget_exhausted": false
}
When the budget is exhausted, the server rejects new training rounds for the project. You must either allocate additional budget or start a new project version.
Secure aggregation
Secure aggregation (SecAgg) ensures the server only sees the sum of client updates, never individual contributions. This is critical when the server itself is not fully trusted, or when regulatory requirements demand that no single entity can access individual model updates.
How it works in Octomil
- At round start, participating clients exchange cryptographic masks with each other (via the server as relay).
- Each client adds its mask to its model update before uploading.
- The server sums all masked updates. Masks cancel out, revealing only the aggregate.
- If a client drops out mid-round, surviving clients can reconstruct the dropout's mask using secret sharing.
Configuration considerations
min_participating_clients: SecAgg requires a minimum number of clients to complete the round. Set this based on your expected dropout rate. If fewer clients than this threshold survive, the round is aborted.reconstruction_threshold: Fraction of original participants needed to reconstruct dropped clients' masks. Lower values tolerate more dropout but weaken security.- SecAgg adds latency to round setup (key exchange) and completion (unmasking). For small device fleets (< 20 devices), the overhead may outweigh the benefit.
When to use SecAgg vs DP alone
| Scenario | Recommendation |
|---|---|
| You trust your own server infrastructure | DP alone is sufficient |
| Regulatory requirement for no individual update visibility | SecAgg + DP |
| Small fleet (< 20 devices) | DP alone (SecAgg overhead too high) |
| High dropout rate (> 40% per round) | DP alone (SecAgg rounds will frequently abort) |
| Multi-party FL across organizations | SecAgg + DP strongly recommended |
Compliance integration
GDPR checklist for federated learning
Federated learning simplifies GDPR compliance because raw data never leaves the device, but you still need to address:
- Data minimization: Confirm model updates do not encode unnecessary personal information. Enable DP to formalize this.
- Purpose limitation: Document the training objective and ensure client devices only train for the stated purpose.
- Right to erasure: Implement client unlearning or model retraining workflows. Octomil tracks per-client contribution metadata to support selective round exclusion.
- Data Protection Impact Assessment (DPIA): Federated learning deployments processing sensitive categories (health, biometrics) require a DPIA. Octomil's audit log provides evidence for the assessment.
- Cross-border transfers: If devices span EU and non-EU jurisdictions, ensure model aggregation servers are in compliant regions. Octomil supports region-pinned deployments.
- Audit trail: Enable Logs & Audit to record round participation, privacy parameters, and aggregation events.
HIPAA considerations
For healthcare applications:
- Enable SecAgg + DP with
epsilon <= 4.0. - Ensure the Octomil server runs in a HIPAA-eligible environment (BAA required with your cloud provider).
- Raw health data must never leave the device. Verify your on-device training pipeline does not upload intermediate artifacts.
- Log all access to model artifacts and round metadata. Octomil audit logs are immutable and exportable.
- Client device identifiers must be pseudonymized. Octomil device tokens are opaque by default. See Device Token Lifecycle.
Privacy pipeline in Octomil server
The server applies privacy filters in sequence during aggregation. These are configured per-project and enforced automatically:
- Gradient clipping -- L2 norm bound on each client update.
- Noise injection -- Gaussian noise calibrated to the clipping norm and target epsilon.
- Quantization (optional) -- Reduces update precision, which provides a secondary privacy benefit by discarding low-order bits.
This pipeline runs automatically during each training round when differential privacy is enabled.
Trade-offs
Privacy vs accuracy
Every privacy mechanism reduces model accuracy. The question is how much degradation is acceptable.
- Gradient clipping biases updates toward zero. If
max_grad_normis too aggressive, training slows or stalls. - DP noise adds variance to aggregated updates. More clients per round reduces the noise-to-signal ratio (the noise divides across more contributions).
- SecAgg does not affect accuracy but increases round latency depending on fleet size.
Practical recommendation: run a baseline experiment without privacy, then enable DP with a conservative epsilon and measure accuracy loss. Tighten epsilon incrementally until accuracy degradation exceeds your threshold.
Privacy vs convergence speed
DP noise slows convergence. Compensate by:
- Increasing the number of local epochs per round (clients train more before uploading).
- Increasing the number of clients per round (noise averages out).
- Using FedProx or FedAdam strategies that are more robust to noisy updates than vanilla FedAvg.
Best practices
- Start with DP disabled during model development. Get the model architecture and hyperparameters right first, then add privacy constraints.
- Use the privacy budget as a hard stop, not a suggestion. If you exhaust budget and retrain, you have effectively doubled your epsilon.
- Monitor per-round epsilon spend in the dashboard. Sudden spikes indicate configuration drift or unexpected client behavior.
- Combine SecAgg with DP for defense in depth. SecAgg protects against a curious server; DP protects against attacks on the aggregate itself.
- Test with realistic client counts. DP guarantees depend on the number of participating clients. Testing with 5 clients and deploying to 500 will behave differently.
- Document your privacy parameters in your DPIA. Octomil exports privacy configuration snapshots per project version for audit purposes.
Implementation path
- Start with the privacy configuration above.
- Roll out safely with Model Rollouts.
- Operationalize with Python SDK rollouts.
Further reading
- What is Federated Learning?
- Logs & Audit
- Security Architecture
- Advanced FL Strategies -- Byzantine-robust aggregation