Advanced FL Concepts

This page covers the conceptual foundations for advanced federated learning features: personalization, asynchronous training, and communication-efficient updates.

Personalization

Standard federated learning produces a single global model. Personalization techniques let each device maintain a model tailored to its local data distribution while still benefiting from global knowledge.

Approaches

Fine-tuning: After global aggregation, each device fine-tunes the global model on local data for a few extra epochs. Simple and effective when data heterogeneity is moderate.

Per-layer personalization: Keep some layers global (shared feature extraction) and personalize others (task-specific heads). Configure via the personalization_layers parameter.

Ditto: Each device maintains both a global and a personal model. The personal model is regularized toward the global model with a tunable penalty term lambda. Higher lambda means the personal model stays closer to the global; lower lambda allows more divergence.

federation.train(
    model="recommendations",
    algorithm="ditto",
    ditto_lambda=0.5,  # Regularization strength
    rounds=20,
    min_updates=50,
)

Asynchronous Training

Synchronous FL waits for all selected devices to submit updates before aggregating. This creates stragglers -- slow devices delay the entire round. Asynchronous FL aggregates updates as they arrive.

When to use async

Heterogeneous device fleets (mix of phones, tablets, IoT)
High dropout rates (>30% of devices fail to complete rounds)
Devices with varying connectivity (cellular vs WiFi)

FedBuff

Octomil implements FedBuff (Federated Buffered Aggregation). The server maintains a buffer of incoming updates and aggregates when the buffer reaches a configurable threshold.

federation.train(
    model="predictive-text",
    algorithm="fedavg",
    async_mode=True,
    buffer_size=50,        # Aggregate every 50 updates
    staleness_bound=10,    # Reject updates older than 10 rounds
    rounds=100,
)

Staleness handling

Stale updates (computed on old model versions) degrade convergence. Octomil applies exponential staleness decay: weight = base_weight * 2^(-staleness / half_life).

Communication Efficiency

Model updates can be large (hundreds of MB for modern networks). Reducing communication cost is critical for mobile and bandwidth-constrained devices.

Gradient compression

Compress updates before transmission using top-k sparsification or random sparsification:

federation.train(
    model="image-classifier",
    compression="topk",
    compression_ratio=0.1,  # Send only top 10% of gradients
    rounds=50,
)

Sparse updates

Instead of sending full model updates, send only the parameters that changed significantly. Octomil uses a threshold-based approach: only parameters with delta > threshold are transmitted.

Quantization

Reduce the precision of transmitted updates from 32-bit to 8-bit or lower:

federation.train(
    model="speech-model",
    update_quantization="int8",  # 4x bandwidth reduction
    rounds=50,
)

Bandwidth impact

Technique	Bandwidth Reduction	Accuracy Impact
Top-k (10%)	90%	1-3% accuracy loss
Random sparsification	80-95%	2-5% accuracy loss
INT8 quantization	75%	Under 1% accuracy loss
Combined (top-k + INT8)	97%	3-5% accuracy loss

Personalization​

Approaches​

Asynchronous Training​

When to use async​

FedBuff​

Staleness handling​

Communication Efficiency​

Gradient compression​

Sparse updates​

Quantization​

Bandwidth impact​