Federated Training

Run training across devices from multiple organizations without centralizing data.

How it works

App developer configures the SDK on each device (done once)
Federation owner starts the training run from the CLI
Each round: server selects devices, devices train locally, devices send weight updates, server aggregates

Configure devices

Install the SDK and register each device with the Octomil server. This runs on the device itself:

Python
iOS (Swift)
Android (Kotlin)

from octomil import FederatedClient

client = FederatedClient(api_key="oct_sk_live_...", device_identifier="hospital-001")
client.register()

import Octomil

let client = OctomilClient(apiKey: "oct_sk_live_...", orgId: "your-org-id")
try await client.register()

import ai.octomil.OctomilClient

val client = OctomilClient(apiKey = "oct_sk_live_...", orgId = "your-org-id", context = this)
client.register()

Once registered, the device sits idle until the server starts a training round.

Start training

The federation owner triggers the run:

octomil train start radiology-v1 \
  --strategy fedavg \
  --rounds 50 \
  --group production

Each round:

Server selects participating devices from all active member organizations
Server sends current model weights to those devices
Each device trains locally on its own data
Each device sends back weight updates (gradients) — never raw data
Server aggregates updates into an improved model
Improved model is sent to devices for the next round

Send weight updates

When a training round starts, the SDK trains on local data and submits the updated weights automatically:

Python
iOS (Swift)
Android (Kotlin)

from octomil import FederatedClient

client = FederatedClient(api_key="oct_sk_live_...", device_identifier="hospital-001")
client.register()

# Option 1: Point to local data — SDK handles training and upload
client.train(model="radiology-v1", data="/data/patients.csv", target_col="diagnosis")

# Option 2: Full control — bring your own training loop
def train_locally(base_state_dict):
    model = MyModel()
    model.load_state_dict(base_state_dict)
    train_one_epoch(model, local_dataloader)
    return model.state_dict(), len(local_data), {"loss": 0.42}

client.train_from_remote(model="radiology-v1", local_train_fn=train_locally, rounds=5)

import Octomil

let client = OctomilClient(apiKey: "oct_sk_live_...", orgId: "your-org-id")
try await client.register()

// Option 1: Simple — SDK handles training and upload
try await client.train(modelId: "radiology-v1", data: localPatientScans, samples: 1000)

// Option 2: Full control — custom training, then upload weights
try await client.participateInTrainingRound(
    modelId: "radiology-v1",
    trainingData: localPatientScans,
    sampleCount: 1000
)

The SDK serializes your model update and uploads it to the server. On Wi-Fi-only policies (the default), uploads are deferred until Wi-Fi is available.

import ai.octomil.OctomilClient

val client = OctomilClient(apiKey = "oct_sk_live_...", orgId = "your-org-id", context = this)
client.register()

// Option 1: Simple — SDK handles training and upload
client.train(modelId = "radiology-v1", data = localPatientScans, samples = 1000)

// Option 2: Full control — custom training, then upload weights
client.participateInTrainingRound(
    modelId = "radiology-v1",
    trainingData = localPatientScans,
    sampleCount = 1000
)

For production apps, use WorkManager so training survives app backgrounding:

class FederatedTrainingWorker(
    context: Context,
    params: WorkerParameters
) : CoroutineWorker(context, params) {
    override suspend fun doWork(): Result {
        return try {
            val client = OctomilClient(
                apiKey = "oct_sk_live_...",
                orgId = "your-org-id",
                context = applicationContext
            )
            client.participateInTrainingRound(
                modelId = "radiology-v1",
                trainingData = loadBackgroundData(),
                sampleCount = 1000
            )
            Result.success()
        } catch (e: Exception) {
            Result.retry()
        }
    }
}

Monitor training

octomil train status radiology-v1

Training: radiology-v1 (fed_a1b2c3d4)
Strategy: fedavg
Round: 23 / 50   Status: IN_PROGRESS

Contributions this round:
  org_7f8a9b0c  Acme Health      12 devices   updates: 12/12
  org_abc123    Metro General     8 devices   updates: 6/8

Overall accuracy: 0.874 (improving)

Run inference

After training completes, each device downloads the trained model and runs predictions locally — no cloud calls:

Python
iOS (Swift)
Android (Kotlin)

# Python uses the CLI for local inference
# Start the inference server:
#   octomil serve radiology-v1

import openai

client = openai.OpenAI(base_url="http://localhost:8080/v1", api_key="not-needed")  # pragma: allowlist secret
response = client.chat.completions.create(
    model="radiology-v1",
    messages=[{"role": "user", "content": "Classify this scan."}],
)
print(response.choices[0].message.content)

import Octomil

let client = OctomilClient(apiKey: "oct_sk_live_...", orgId: "your-org-id")
try await client.register()

// Download the trained model (cached locally for offline use)
let model = try await client.downloadModel(modelId: "radiology-v1")

// Run inference — entirely on-device
let prediction = try model.predict(input: ["features": scanFeatures])
print("Diagnosis: \(prediction["label"]!)")

The server automatically serves the correct model version based on active rollouts. Subsequent calls use the local cache until a new version is available.

import ai.octomil.OctomilClient

val client = OctomilClient(apiKey = "oct_sk_live_...", orgId = "your-org-id", context = this)
client.register()

// Download the trained model (cached locally for offline use)
val model = client.downloadModel(modelId = "radiology-v1")

// Run inference — entirely on-device
val prediction = model.predict(mapOf("features" to scanFeatures))
println("Diagnosis: ${prediction["label"]}")

The server automatically serves the correct model version based on active rollouts. Subsequent calls use the local cache until a new version is available.

Rules

Raw data never leaves the device
Only active federation members can submit updates
Each organization controls its own differential privacy budget: octomil team set-policy --privacy-budget <epsilon>
Federation-scoped aggregation: updates from member devices only
Per-organization contribution tracked per round

Deploy the trained model

When training completes, the server saves aggregated weights as a new version:

Training complete: radiology-v1
New version: 2.0.0 (aggregated from 50 rounds, 2 organizations)

Each organization deploys independently:

octomil deploy radiology-v1 --version 2.0.0 --rollout 10%
# validate metrics
octomil deploy radiology-v1 --version 2.0.0 --rollout 100%

Consider an A/B experiment comparing v1.0.0 against v2.0.0 before full rollout.

If auto-rollback is enabled, deployments exceeding the error threshold roll back automatically:

octomil rollback radiology-v1 --to-version 1.0.0

Federations — create and manage cross-org federations
iOS SDK — full iOS SDK reference
Android SDK — full Android SDK reference
Python SDK — full Python SDK reference
Rollouts — canary and progressive deployment
Experiments — A/B testing
Privacy guide — differential privacy budgets

How it works​

Configure devices​

Start training​

Send weight updates​

Monitor training​

Run inference​

Rules​

Deploy the trained model​

Related​