Advanced FL Concepts
This page covers the conceptual foundations for advanced federated learning features: personalization, asynchronous training, and communication-efficient updates.
Personalization
Standard federated learning produces a single global model. Personalization techniques let each device maintain a model tailored to its local data distribution while still benefiting from global knowledge.
Approaches
Fine-tuning: After global aggregation, each device fine-tunes the global model on local data for a few extra epochs. Simple and effective when data heterogeneity is moderate.
Per-layer personalization: Keep some layers global (shared feature extraction) and personalize others (task-specific heads). Configure via the personalization_layers parameter.
Ditto: Each device maintains both a global and a personal model. The personal model is regularized toward the global model with a tunable penalty term lambda. Higher lambda means the personal model stays closer to the global; lower lambda allows more divergence.
federation.train(
model="recommendations",
algorithm="ditto",
ditto_lambda=0.5, # Regularization strength
rounds=20,
min_updates=50,
)
Asynchronous Training
Synchronous FL waits for all selected devices to submit updates before aggregating. This creates stragglers -- slow devices delay the entire round. Asynchronous FL aggregates updates as they arrive.
When to use async
- Heterogeneous device fleets (mix of phones, tablets, IoT)
- High dropout rates (>30% of devices fail to complete rounds)
- Devices with varying connectivity (cellular vs WiFi)
FedBuff
Octomil implements FedBuff (Federated Buffered Aggregation). The server maintains a buffer of incoming updates and aggregates when the buffer reaches a configurable threshold.
federation.train(
model="predictive-text",
algorithm="fedavg",
async_mode=True,
buffer_size=50, # Aggregate every 50 updates
staleness_bound=10, # Reject updates older than 10 rounds
rounds=100,
)
Staleness handling
Stale updates (computed on old model versions) degrade convergence. Octomil applies exponential staleness decay: weight = base_weight * 2^(-staleness / half_life).
Communication Efficiency
Model updates can be large (hundreds of MB for modern networks). Reducing communication cost is critical for mobile and bandwidth-constrained devices.
Gradient compression
Compress updates before transmission using top-k sparsification or random sparsification:
federation.train(
model="image-classifier",
compression="topk",
compression_ratio=0.1, # Send only top 10% of gradients
rounds=50,
)
Sparse updates
Instead of sending full model updates, send only the parameters that changed significantly. Octomil uses a threshold-based approach: only parameters with delta > threshold are transmitted.
Quantization
Reduce the precision of transmitted updates from 32-bit to 8-bit or lower:
federation.train(
model="speech-model",
update_quantization="int8", # 4x bandwidth reduction
rounds=50,
)
Bandwidth impact
| Technique | Bandwidth Reduction | Accuracy Impact |
|---|---|---|
| Top-k (10%) | 90% | 1-3% accuracy loss |
| Random sparsification | 80-95% | 2-5% accuracy loss |
| INT8 quantization | 75% | Under 1% accuracy loss |
| Combined (top-k + INT8) | 97% | 3-5% accuracy loss |