Advanced FL Strategies

This page covers Byzantine-robust aggregation (defending against malicious or faulty clients) and specialized training objectives (AUC, fairness, tail-risk) for federated learning.

Byzantine Robustness

Byzantine-robust FL protects training against adversarial, faulty, or poisoned client updates. With standard FedAvg, a single malicious client can dominate the aggregated update by sending gradients with 100x the magnitude of honest updates.

Threat model

Byzantine clients deviate from the expected protocol due to malicious intent (crafted updates to degrade the model), data poisoning (corrupted local data), model poisoning (directly manipulated gradients or backdoor injection), or hardware/software faults (garbage updates from memory errors or numerical overflow).

Common attacks include gradient scaling, sign flipping, backdoor injection, label flipping, free-riding, and Sybil attacks.

Robust aggregation strategies

Octomil provides four Byzantine-robust aggregation strategies that replace FedAvg's naive averaging.

Krum selects the single client update closest to its neighbors in parameter space. Tolerates a minority of Byzantine clients. Works best with 10-100 clients.

Multi-Krum extends Krum by selecting the top k closest updates and averaging them. Set k = n - f to select all clients except suspected Byzantine ones.

FedMedian computes the coordinate-wise median of all client updates. No need to specify num_byzantine in advance. Robust to a significant fraction of corrupt values.

FedTrimmedAvg sorts updates along each coordinate, removes the top and bottom beta fraction, and averages the rest. A natural middle ground between FedAvg and FedMedian.

Configuring robust aggregation

from octomil import Federation

federation = Federation(api_key="oct_sk_live_...", name="my-robust-model")

result = federation.train(
    model="my-robust-model",
    algorithm="krum",       # or "multi_krum", "fedmedian", "fedtrimmedavg"
    rounds=500,
    min_updates=20,
)

Configure strategy parameters via the REST API:

cURL
Python

# Krum
curl -X PUT https://api.octomil.com/api/v1/federations/my-robust-model/strategy \
  -H "Authorization: Bearer oct_sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{"algorithm": "krum", "num_byzantine": 3, "learning_rate": 0.01, "local_epochs": 5}'

# Multi-Krum
curl -X PUT https://api.octomil.com/api/v1/federations/my-robust-model/strategy \
  -H "Authorization: Bearer oct_sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{"algorithm": "multi_krum", "num_byzantine": 3, "num_selected": 15, "learning_rate": 0.01}'

# FedMedian
curl -X PUT https://api.octomil.com/api/v1/federations/my-robust-model/strategy \
  -H "Authorization: Bearer oct_sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{"algorithm": "fedmedian", "learning_rate": 0.01, "local_epochs": 5}'

# FedTrimmedAvg
curl -X PUT https://api.octomil.com/api/v1/federations/my-robust-model/strategy \
  -H "Authorization: Bearer oct_sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{"algorithm": "fedtrimmedavg", "trim_ratio": 0.1, "learning_rate": 0.01, "local_epochs": 5}'

import requests

# Krum example (same pattern for multi_krum, fedmedian, fedtrimmedavg)
response = requests.put(
    "https://api.octomil.com/api/v1/federations/my-robust-model/strategy",
    headers={"Authorization": "Bearer oct_sk_live_..."},
    json={
        "algorithm": "krum",
        "num_byzantine": 3,
        "learning_rate": 0.01,
        "local_epochs": 5,
    },
)
print(response.json())

Combining defenses

Robust aggregation alone is not sufficient for sophisticated attacks. Layer multiple defenses:

Update clipping -- Bound the L2 norm of each client's update before aggregation to prevent gradient scaling attacks:

{"algorithm": "krum", "gradient_clip_norm": 10.0, "num_byzantine": 3}

Anomaly detection -- Monitor per-client update statistics: cosine similarity between each client's update and the aggregate, update norm spikes, and consistent exclusion by Krum or trimming.

Trust scoring -- Assign trust scores based on historical behavior. Clients consistently selected by Krum receive higher trust; consistently trimmed clients receive lower trust. Use Device Groups to segment by trust tier.

Choosing a strategy

Scenario	Strategy	Rationale
Unknown threat, moderate client count	Krum	Strongest guarantee, minimal tuning
Robustness with less information loss	Multi-Krum	Averages top-k, better convergence
High client count, unknown attacker count	FedMedian	No `num_byzantine` needed, scales well
Known attacker fraction, best convergence	FedTrimmedAvg	Retains most information, tunable
Low-stakes, mostly trusted fleet	FedAvg + clipping	Clipping alone handles faults

Performance impact

Strategy	Server Compute	Communication	Convergence
FedAvg	Baseline	Baseline	Fastest (no attackers)
Krum	Higher	None	Slower (1 update)
Multi-Krum	Higher	None	Moderate
FedMedian	Moderate	None	Moderate
FedTrimmedAvg	Moderate	None	Near FedAvg

Specialized Objectives

Some federated applications need objectives beyond cross-entropy -- AUC optimization, fairness-aware training, or tail-risk minimization.

Why standard losses fail

Cross-entropy optimizes for average-case accuracy, which fails when classes are imbalanced (99.5% benign vs 0.5% fraud) or error costs are asymmetric. In FL, class imbalance is compounded: each client may see an even more extreme skew, and some clients have zero minority-class examples.

AUC optimization

AUC measures ranking ability independent of classification threshold. Octomil supports pairwise surrogate loss and compositional AUC optimization:

cURL
Python

curl -X PUT https://api.octomil.com/api/v1/federations/auc-model/strategy \
  -H "Authorization: Bearer oct_sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "algorithm": "fedavg",
    "objective": "auc_surrogate",
    "objective_config": {"margin": 1.0, "pos_weight": 10.0}
  }'

import requests

response = requests.put(
    "https://api.octomil.com/api/v1/federations/auc-model/strategy",
    headers={"Authorization": "Bearer oct_sk_live_..."},
    json={
        "algorithm": "fedavg",
        "objective": "auc_surrogate",
        "objective_config": {"margin": 1.0, "pos_weight": 10.0},
    },
)

Fairness-aware objectives

Minimax fairness minimizes the worst-case loss across clients:

{"objective": "minimax", "objective_config": {"lambda": 0.5, "ema_decay": 0.9}}

Per-group accuracy constraints define minimum accuracy targets per device group:

{
  "algorithm": "fedavg",
  "fairness_constraints": [
    {"device_group": "region-eu", "min_accuracy": 0.85},
    {"device_group": "region-apac", "min_accuracy": 0.85}
  ]
}

Class-weighted aggregation

Clients with more minority-class examples receive higher aggregation weight:

{
  "client_weighting": "class_balanced",
  "weighting_config": {
    "target_distribution": {"class_0": 0.5, "class_1": 0.5},
    "smoothing": 0.1
  }
}

Tail-risk objectives

For safety-critical applications (medical imaging, autonomous systems, financial models), minimize worst-case loss instead of average loss:

CVaR (Conditional Value at Risk): optimizes the average loss on the worst alpha fraction of examples
DRO (Distributionally Robust Optimization): finds a model robust to distribution shift

{"objective": "cvar", "objective_config": {"alpha": 0.1, "dual_step_size": 0.01}}

Choosing the right objective

Application	Objective	Rationale
Balanced classification	Cross-entropy (default)	Standard, converges reliably
Rare-event detection	AUC surrogate + class weighting	Threshold-independent, handles imbalance
Fairness-critical	Minimax + per-group constraints	No subpopulation underserved
Safety-critical	CVaR or DRO	Minimizes worst-case failures
Multi-domain deployment	Ditto personalization	Per-client adaptation with shared knowledge

Best Practices

Set num_byzantine conservatively. Overestimating is safer than underestimating. If you expect 2 bad clients, set to 4-5.
Use FedTrimmedAvg as a production default for robust aggregation. Start with trim_ratio=0.1.
Stage strategy changes via rollouts. Use Model Rollouts to canary robust aggregation on a subset of devices.
Start with class weighting before switching objectives. Only move to AUC surrogate if class weighting is insufficient.
Track per-class and per-group metrics separately. Overall accuracy hides fairness problems.
Combine specialized objectives with robust aggregation. Imbalanced datasets are more susceptible to poisoning.
Require minimum client counts. Ensure min_devices_per_round is at least 3x your num_byzantine estimate.

Byzantine Robustness​

Threat model​

Robust aggregation strategies​

Configuring robust aggregation​

Combining defenses​

Choosing a strategy​

Performance impact​

Specialized Objectives​

Why standard losses fail​

AUC optimization​

Fairness-aware objectives​

Class-weighted aggregation​

Tail-risk objectives​

Choosing the right objective​

Best Practices​