Skip to main content

Advanced FL Concepts

This page covers the conceptual foundations for advanced federated learning features: personalization, asynchronous training, and communication-efficient updates.

Personalization

Standard federated learning produces a single global model. Personalization techniques let each device maintain a model tailored to its local data distribution while still benefiting from global knowledge.

Approaches

Fine-tuning: After global aggregation, each device fine-tunes the global model on local data for a few extra epochs. Simple and effective when data heterogeneity is moderate.

Per-layer personalization: Keep some layers global (shared feature extraction) and personalize others (task-specific heads). Configure via the personalization_layers parameter.

Ditto: Each device maintains both a global and a personal model. The personal model is regularized toward the global model with a tunable penalty term lambda. Higher lambda means the personal model stays closer to the global; lower lambda allows more divergence.

federation.train(
model="recommendations",
algorithm="ditto",
ditto_lambda=0.5, # Regularization strength
rounds=20,
min_updates=50,
)

Asynchronous Training

Synchronous FL waits for all selected devices to submit updates before aggregating. This creates stragglers -- slow devices delay the entire round. Asynchronous FL aggregates updates as they arrive.

When to use async

  • Heterogeneous device fleets (mix of phones, tablets, IoT)
  • High dropout rates (>30% of devices fail to complete rounds)
  • Devices with varying connectivity (cellular vs WiFi)

FedBuff

Octomil implements FedBuff (Federated Buffered Aggregation). The server maintains a buffer of incoming updates and aggregates when the buffer reaches a configurable threshold.

federation.train(
model="predictive-text",
algorithm="fedavg",
async_mode=True,
buffer_size=50, # Aggregate every 50 updates
staleness_bound=10, # Reject updates older than 10 rounds
rounds=100,
)

Staleness handling

Stale updates (computed on old model versions) degrade convergence. Octomil applies exponential staleness decay: weight = base_weight * 2^(-staleness / half_life).

Communication Efficiency

Model updates can be large (hundreds of MB for modern networks). Reducing communication cost is critical for mobile and bandwidth-constrained devices.

Gradient compression

Compress updates before transmission using top-k sparsification or random sparsification:

federation.train(
model="image-classifier",
compression="topk",
compression_ratio=0.1, # Send only top 10% of gradients
rounds=50,
)

Sparse updates

Instead of sending full model updates, send only the parameters that changed significantly. Octomil uses a threshold-based approach: only parameters with delta > threshold are transmitted.

Quantization

Reduce the precision of transmitted updates from 32-bit to 8-bit or lower:

federation.train(
model="speech-model",
update_quantization="int8", # 4x bandwidth reduction
rounds=50,
)

Bandwidth impact

TechniqueBandwidth ReductionAccuracy Impact
Top-k (10%)90%1-3% accuracy loss
Random sparsification80-95%2-5% accuracy loss
INT8 quantization75%Under 1% accuracy loss
Combined (top-k + INT8)97%3-5% accuracy loss