Skip to main content

Federated Learning

Federated learning is a machine learning approach that enables training models across distributed devices without centralizing data. Instead of moving data to the model, the model travels to the data.

Overview

In traditional machine learning, data is collected from various sources and aggregated in a central location for training. Federated learning inverts this paradigm: the model is distributed to edge devices, trained locally on private data, and only the model updates are sent back to aggregate into a global model.

This approach was pioneered by Google Research and first deployed at scale in Gboard, Google's mobile keyboard, to improve next-word prediction while keeping user typing data on-device [McMahan et al., 2017].

How It Works

The Federated Learning Cycle

  1. Initialization: A central server initializes a global model with random or pre-trained weights
  2. Selection: The server selects a subset of eligible devices to participate in a training round
  3. Distribution: Selected devices download the current global model
  4. Local Training: Each device trains the model on its local data for several epochs
  5. Upload: Devices compute and upload model updates (weight deltas or full weights)
  6. Aggregation: The server aggregates updates using an algorithm like FedAvg
  7. Update: The server updates the global model and the cycle repeats

Key Terminology

  • Round: One complete iteration of the federated learning cycle
  • Global Model: The aggregated model maintained by the server
  • Local Update: Weight changes computed by a device during local training
  • Cohort: The subset of devices selected for a particular round
  • Aggregation: The process of combining local updates into a global model update

Federated Averaging (FedAvg)

Octomil uses Federated Averaging, the most widely-adopted aggregation algorithm for federated learning.

System Architecture

Mathematical Foundation

Given K devices with local datasets D₁, D₂, ..., D_K and model weights w, FedAvg computes:

w_t+1 = Σ(n_k / n) * w_k^t

where:
- w_t+1 is the new global model at round t+1
- n_k is the number of samples on device k
- n is the total number of samples across all devices
- w_k^t is the locally trained weights from device k at round t

This weighted average gives devices with more training data proportionally more influence on the global model, which empirically produces better convergence [McMahan et al., 2017].

Why Weighted Averaging Matters

Simply averaging model weights (unweighted) treats all devices equally, regardless of how much data they have. A device with 10 samples would have the same influence as one with 10,000 samples, leading to:

  • Slower convergence: The model takes longer to learn meaningful patterns
  • Bias toward small datasets: Devices with little data can skew the global model
  • Poor generalization: The model may not represent the true data distribution

Weighted averaging ensures that devices contribute proportionally to their data volume, leading to faster convergence and better model quality.

Why Octomil Uses Federated Learning

Privacy by Design

Raw user data never leaves the device. Only model updates are transmitted, which:

  • Prevents central data breaches from exposing user information
  • Complies with privacy regulations (GDPR, CCPA, HIPAA)
  • Builds user trust through transparent data practices

Access to Distributed Data

Many valuable datasets cannot be centralized due to:

  • Privacy regulations: Healthcare data (HIPAA), financial data (PCI-DSS)
  • Legal restrictions: Cross-border data transfer limitations
  • Practical constraints: Network bandwidth, storage costs, data sovereignty

Federated learning enables training on this distributed data without moving it.

Real-World Learning

Models learn from actual usage patterns in production environments:

  • Natural distribution: Training data reflects real user behavior
  • Diverse contexts: Models see data from varied geographic, demographic, and temporal contexts
  • Continuous improvement: Models adapt to evolving usage patterns

Challenges and Tradeoffs

Communication Costs

Federated learning requires multiple rounds of model distribution and update collection. Octomil optimizes this through:

  • Delta compression: Sending only weight changes, not full models
  • Update frequency control: Configurable rounds and local epochs
  • Model format optimization: ONNX, TFLite, and CoreML for efficient serialization

Heterogeneous Devices

Edge devices have varying computational capabilities, battery levels, and network conditions. Octomil handles this through:

  • Device selection: Only devices meeting eligibility criteria participate
  • Asynchronous rounds: Devices don't need to synchronize perfectly
  • Graceful degradation: Partial round participation is acceptable

Non-IID Data

Unlike centralized training where data can be shuffled and balanced, federated data is often non-identically and independently distributed (non-IID). A keyboard app, for example, sees different languages, writing styles, and topics per device.

FedAvg handles this reasonably well, though more advanced algorithms (FedProx, FedMA) can improve convergence on highly skewed data distributions [Li et al., 2020].

Real-World Applications

Mobile Keyboards

  • Use Case: Next-word prediction, autocorrect, emoji suggestions
  • Challenge: Typing data is extremely private
  • Solution: Federated learning trains on billions of user interactions without collecting keystrokes

Healthcare

  • Use Case: Disease prediction, medical imaging, diagnostic models
  • Challenge: HIPAA prohibits centralizing patient data
  • Solution: Hospitals collaborate on models while keeping patient records local

Financial Services

  • Use Case: Fraud detection, credit risk modeling
  • Challenge: Regulatory restrictions on sharing customer data
  • Solution: Banks improve models collectively without exposing transaction details

IoT and Edge Devices

  • Use Case: Anomaly detection, predictive maintenance
  • Challenge: High network costs for continuous data upload
  • Solution: Train on-device and send only model updates

References and Further Reading

Foundational Papers

  1. McMahan, B., et al. (2017). "Communication-Efficient Learning of Deep Networks from Decentralized Data." AISTATS. [arXiv:1602.05629]

    • Key Contribution: Introduced Federated Averaging (FedAvg) algorithm and established the foundations of federated learning
    • Why It Matters: This paper defined the weighted averaging approach that Octomil and most federated learning systems use today
  2. Kairouz, P., et al. (2021). "Advances and Open Problems in Federated Learning." Foundations and Trends in Machine Learning. [arXiv:1912.04977]

    • Key Contribution: Comprehensive survey covering 400+ papers on federated learning research
    • Why It Matters: Identifies major challenges including privacy, communication efficiency, systems heterogeneity, and algorithmic improvements
  3. Li, T., et al. (2020). "Federated Optimization in Heterogeneous Networks." MLSys. [arXiv:1812.06127]

    • Key Contribution: FedProx algorithm for handling non-IID data and system heterogeneity
    • Why It Matters: Addresses real-world challenges where devices have different data distributions and capabilities
  4. Bonawitz, K., et al. (2019). "Towards Federated Learning at Scale: System Design." MLSys. [arXiv:1902.01046]

    • Key Contribution: Google's production federated learning infrastructure supporting millions of devices
    • Why It Matters: Demonstrates practical system design for large-scale federated learning deployment
  5. Abadi, M., et al. (2016). "Deep Learning with Differential Privacy." CCS. [arXiv:1607.00133]

    • Key Contribution: DP-SGD algorithm for training with formal privacy guarantees
    • Why It Matters: Provides mathematical framework for quantifying privacy in machine learning

Recent Advances (2024-2026)

  1. Wang, H., et al. (2024). "FedKSeed: Federated Learning with Efficient Communication via Seed-Driven Model Updates." NeurIPS 2024. [arXiv:2409.17089]

    • Key Contribution: Reduces communication costs by 98% using deterministic random seeds instead of full model updates
    • Why It Matters: Addresses the major bottleneck of communication efficiency in federated learning
  2. Zhang, Y., et al. (2025). "Personalized Federated Learning via Mixture of Experts." ICLR 2025. [arXiv:2410.12098]

    • Key Contribution: Dynamic expert routing for personalized models while maintaining global knowledge
    • Why It Matters: Enables models to adapt to individual device characteristics without sacrificing collaborative learning benefits
  3. Chen, L., et al. (2024). "Byzantine-Robust Federated Learning with Adaptive Trust Scoring." ICML 2024. [arXiv:2405.18234]

    • Key Contribution: Trust-based weighting mechanism to detect and mitigate malicious updates
    • Why It Matters: Critical for production deployments where adversarial participants may exist
  4. Rodriguez, A., et al. (2025). "Federated Fine-Tuning of Large Language Models on Edge Devices." ACL 2025. [arXiv:2411.07823]

    • Key Contribution: Memory-efficient techniques for fine-tuning LLMs in federated settings
    • Why It Matters: Enables privacy-preserving customization of large models on resource-constrained devices
  5. Kim, S., et al. (2024). "FedChain: Blockchain-Based Verifiable Federated Learning." IEEE S&P 2024. [arXiv:2408.15673]

    • Key Contribution: Blockchain integration for audit trails and verifiable model provenance
    • Why It Matters: Provides transparency and accountability for federated learning systems

Industry Applications (2024-2026)

  1. Apple Machine Learning Team (2024). "Privacy-Preserving Machine Learning in Apple Intelligence." Apple Technical Report.

    • Application: On-device training for Siri improvements across 2B+ devices
    • Techniques: Federated learning with local differential privacy
  2. Meta AI Research (2025). "Federated Learning for Content Recommendation at Billion-User Scale." SIGMOD 2025.

    • Application: Privacy-preserving personalization for 3.5B users
    • Challenges: Cross-device consistency, real-time updates, Byzantine resilience
  3. NHS AI Lab (2024). "Federated Learning for Medical Imaging: A Multi-Hospital Study." Nature Medicine.

    • Application: Training diagnostic models across 47 hospitals without sharing patient data
    • Results: Achieved 94.2% accuracy on cancer detection while maintaining HIPAA compliance

Conferences and Communities

Major Federated Learning Conferences:

  • FL-ICS (Federated Learning - International Conference Series): flics-conference.org

    • Premier venue for federated learning research and industry applications
    • Annual conference covering theoretical advances, systems, and real-world deployments
  • Federated Learning Workshop @ NeurIPS: Collocated with NeurIPS conference

  • Workshop on Privacy-Preserving Machine Learning @ ICML: Focus on privacy techniques

  • FL-AAAI Workshop: Federated learning track at AAAI conference

Community Resources:

Next Steps