Skip to main content

Advanced FL Strategies

This page covers Byzantine-robust aggregation (defending against malicious or faulty clients) and specialized training objectives (AUC, fairness, tail-risk) for federated learning.

Byzantine Robustness

Byzantine-robust FL protects training against adversarial, faulty, or poisoned client updates. With standard FedAvg, a single malicious client can dominate the aggregated update by sending gradients with 100x the magnitude of honest updates.

Threat model

Byzantine clients deviate from the expected protocol due to malicious intent (crafted updates to degrade the model), data poisoning (corrupted local data), model poisoning (directly manipulated gradients or backdoor injection), or hardware/software faults (garbage updates from memory errors or numerical overflow).

Common attacks include gradient scaling, sign flipping, backdoor injection, label flipping, free-riding, and Sybil attacks.

Robust aggregation strategies

Octomil provides four Byzantine-robust aggregation strategies that replace FedAvg's naive averaging.

Krum selects the single client update closest to its neighbors in parameter space. Tolerates a minority of Byzantine clients. Works best with 10-100 clients.

Multi-Krum extends Krum by selecting the top k closest updates and averaging them. Set k = n - f to select all clients except suspected Byzantine ones.

FedMedian computes the coordinate-wise median of all client updates. No need to specify num_byzantine in advance. Robust to a significant fraction of corrupt values.

FedTrimmedAvg sorts updates along each coordinate, removes the top and bottom beta fraction, and averages the rest. A natural middle ground between FedAvg and FedMedian.

Configuring robust aggregation

from octomil import Federation

federation = Federation(api_key="edg_...", name="my-robust-model")

result = federation.train(
model="my-robust-model",
algorithm="krum", # or "multi_krum", "fedmedian", "fedtrimmedavg"
rounds=500,
min_updates=20,
)

Configure strategy parameters via the REST API:

# Krum
curl -X PUT https://api.octomil.com/api/v1/federations/my-robust-model/strategy \
-H "Authorization: Bearer edg_..." \
-H "Content-Type: application/json" \
-d '{"algorithm": "krum", "num_byzantine": 3, "learning_rate": 0.01, "local_epochs": 5}'

# Multi-Krum
curl -X PUT https://api.octomil.com/api/v1/federations/my-robust-model/strategy \
-H "Authorization: Bearer edg_..." \
-H "Content-Type: application/json" \
-d '{"algorithm": "multi_krum", "num_byzantine": 3, "num_selected": 15, "learning_rate": 0.01}'

# FedMedian
curl -X PUT https://api.octomil.com/api/v1/federations/my-robust-model/strategy \
-H "Authorization: Bearer edg_..." \
-H "Content-Type: application/json" \
-d '{"algorithm": "fedmedian", "learning_rate": 0.01, "local_epochs": 5}'

# FedTrimmedAvg
curl -X PUT https://api.octomil.com/api/v1/federations/my-robust-model/strategy \
-H "Authorization: Bearer edg_..." \
-H "Content-Type: application/json" \
-d '{"algorithm": "fedtrimmedavg", "trim_ratio": 0.1, "learning_rate": 0.01, "local_epochs": 5}'

Combining defenses

Robust aggregation alone is not sufficient for sophisticated attacks. Layer multiple defenses:

Update clipping -- Bound the L2 norm of each client's update before aggregation to prevent gradient scaling attacks:

{"algorithm": "krum", "gradient_clip_norm": 10.0, "num_byzantine": 3}

Anomaly detection -- Monitor per-client update statistics: cosine similarity between each client's update and the aggregate, update norm spikes, and consistent exclusion by Krum or trimming.

Trust scoring -- Assign trust scores based on historical behavior. Clients consistently selected by Krum receive higher trust; consistently trimmed clients receive lower trust. Use Device Groups to segment by trust tier.

Choosing a strategy

ScenarioStrategyRationale
Unknown threat, moderate client countKrumStrongest guarantee, minimal tuning
Robustness with less information lossMulti-KrumAverages top-k, better convergence
High client count, unknown attacker countFedMedianNo num_byzantine needed, scales well
Known attacker fraction, best convergenceFedTrimmedAvgRetains most information, tunable
Low-stakes, mostly trusted fleetFedAvg + clippingClipping alone handles faults

Performance impact

StrategyServer ComputeCommunicationConvergence
FedAvgBaselineBaselineFastest (no attackers)
KrumHigherNoneSlower (1 update)
Multi-KrumHigherNoneModerate
FedMedianModerateNoneModerate
FedTrimmedAvgModerateNoneNear FedAvg

Specialized Objectives

Some federated applications need objectives beyond cross-entropy -- AUC optimization, fairness-aware training, or tail-risk minimization.

Why standard losses fail

Cross-entropy optimizes for average-case accuracy, which fails when classes are imbalanced (99.5% benign vs 0.5% fraud) or error costs are asymmetric. In FL, class imbalance is compounded: each client may see an even more extreme skew, and some clients have zero minority-class examples.

AUC optimization

AUC measures ranking ability independent of classification threshold. Octomil supports pairwise surrogate loss and compositional AUC optimization:

curl -X PUT https://api.octomil.com/api/v1/federations/auc-model/strategy \
-H "Authorization: Bearer edg_..." \
-H "Content-Type: application/json" \
-d '{
"algorithm": "fedavg",
"objective": "auc_surrogate",
"objective_config": {"margin": 1.0, "pos_weight": 10.0}
}'

Fairness-aware objectives

Minimax fairness minimizes the worst-case loss across clients:

{"objective": "minimax", "objective_config": {"lambda": 0.5, "ema_decay": 0.9}}

Per-group accuracy constraints define minimum accuracy targets per device group:

{
"algorithm": "fedavg",
"fairness_constraints": [
{"device_group": "region-eu", "min_accuracy": 0.85},
{"device_group": "region-apac", "min_accuracy": 0.85}
]
}

Class-weighted aggregation

Clients with more minority-class examples receive higher aggregation weight:

{
"client_weighting": "class_balanced",
"weighting_config": {
"target_distribution": {"class_0": 0.5, "class_1": 0.5},
"smoothing": 0.1
}
}

Tail-risk objectives

For safety-critical applications (medical imaging, autonomous systems, financial models), minimize worst-case loss instead of average loss:

  • CVaR (Conditional Value at Risk): optimizes the average loss on the worst alpha fraction of examples
  • DRO (Distributionally Robust Optimization): finds a model robust to distribution shift
{"objective": "cvar", "objective_config": {"alpha": 0.1, "dual_step_size": 0.01}}

Choosing the right objective

ApplicationObjectiveRationale
Balanced classificationCross-entropy (default)Standard, converges reliably
Rare-event detectionAUC surrogate + class weightingThreshold-independent, handles imbalance
Fairness-criticalMinimax + per-group constraintsNo subpopulation underserved
Safety-criticalCVaR or DROMinimizes worst-case failures
Multi-domain deploymentDitto personalizationPer-client adaptation with shared knowledge

Best Practices

  1. Set num_byzantine conservatively. Overestimating is safer than underestimating. If you expect 2 bad clients, set to 4-5.
  2. Use FedTrimmedAvg as a production default for robust aggregation. Start with trim_ratio=0.1.
  3. Stage strategy changes via rollouts. Use Model Rollouts to canary robust aggregation on a subset of devices.
  4. Start with class weighting before switching objectives. Only move to AUC surrogate if class weighting is insufficient.
  5. Track per-class and per-group metrics separately. Overall accuracy hides fairness problems.
  6. Combine specialized objectives with robust aggregation. Imbalanced datasets are more susceptible to poisoning.
  7. Require minimum client counts. Ensure min_devices_per_round is at least 3x your num_byzantine estimate.