Federated Training
Run training across devices from multiple organizations without centralizing data.
How it works
- App developer configures the SDK on each device (done once)
- Federation owner starts the training run from the CLI
- Each round: server selects devices, devices train locally, devices send weight updates, server aggregates
Configure devices
Install the SDK and register each device with the Octomil server. This runs on the device itself:
- Python
- iOS (Swift)
- Android (Kotlin)
from octomil import FederatedClient
client = FederatedClient(api_key="edg_...", device_identifier="hospital-001")
client.register()
import Octomil
let client = OctomilClient(apiKey: "edg_...", orgId: "your-org-id")
try await client.register()
import ai.octomil.OctomilClient
val client = OctomilClient(apiKey = "edg_...", orgId = "your-org-id", context = this)
client.register()
Once registered, the device sits idle until the server starts a training round.
Start training
The federation owner triggers the run:
octomil train start radiology-v1 \
--strategy fedavg \
--rounds 50 \
--group production
Each round:
- Server selects participating devices from all active member organizations
- Server sends current model weights to those devices
- Each device trains locally on its own data
- Each device sends back weight updates (gradients) — never raw data
- Server aggregates updates into an improved model
- Improved model is sent to devices for the next round
Send weight updates
When a training round starts, the SDK trains on local data and submits the updated weights automatically:
- Python
- iOS (Swift)
- Android (Kotlin)
from octomil import FederatedClient
client = FederatedClient(api_key="edg_...", device_identifier="hospital-001")
client.register()
# Option 1: Point to local data — SDK handles training and upload
client.train(model="radiology-v1", data="/data/patients.csv", target_col="diagnosis")
# Option 2: Full control — bring your own training loop
def train_locally(base_state_dict):
model = MyModel()
model.load_state_dict(base_state_dict)
train_one_epoch(model, local_dataloader)
return model.state_dict(), len(local_data), {"loss": 0.42}
client.train_from_remote(model="radiology-v1", local_train_fn=train_locally, rounds=5)
import Octomil
let client = OctomilClient(apiKey: "edg_...", orgId: "your-org-id")
try await client.register()
// Option 1: Simple — SDK handles training and upload
try await client.train(modelId: "radiology-v1", data: localPatientScans, samples: 1000)
// Option 2: Full control — custom training, then upload weights
try await client.participateInTrainingRound(
modelId: "radiology-v1",
trainingData: localPatientScans,
sampleCount: 1000
)
The SDK serializes your model update and uploads it to the server. On Wi-Fi-only policies (the default), uploads are deferred until Wi-Fi is available.
import ai.octomil.OctomilClient
val client = OctomilClient(apiKey = "edg_...", orgId = "your-org-id", context = this)
client.register()
// Option 1: Simple — SDK handles training and upload
client.train(modelId = "radiology-v1", data = localPatientScans, samples = 1000)
// Option 2: Full control — custom training, then upload weights
client.participateInTrainingRound(
modelId = "radiology-v1",
trainingData = localPatientScans,
sampleCount = 1000
)
For production apps, use WorkManager so training survives app backgrounding:
class FederatedTrainingWorker(
context: Context,
params: WorkerParameters
) : CoroutineWorker(context, params) {
override suspend fun doWork(): Result {
return try {
val client = OctomilClient(
apiKey = "edg_...",
orgId = "your-org-id",
context = applicationContext
)
client.participateInTrainingRound(
modelId = "radiology-v1",
trainingData = loadBackgroundData(),
sampleCount = 1000
)
Result.success()
} catch (e: Exception) {
Result.retry()
}
}
}
Monitor training
octomil train status radiology-v1
Training: radiology-v1 (fed_a1b2c3d4)
Strategy: fedavg
Round: 23 / 50 Status: IN_PROGRESS
Contributions this round:
org_7f8a9b0c Acme Health 12 devices updates: 12/12
org_abc123 Metro General 8 devices updates: 6/8
Overall accuracy: 0.874 (improving)
Run inference
After training completes, each device downloads the trained model and runs predictions locally — no cloud calls:
- Python
- iOS (Swift)
- Android (Kotlin)
# Python uses the CLI for local inference
# Start the inference server:
# octomil serve radiology-v1
import openai
client = openai.OpenAI(base_url="http://localhost:8080/v1", api_key="not-needed") # pragma: allowlist secret
response = client.chat.completions.create(
model="radiology-v1",
messages=[{"role": "user", "content": "Classify this scan."}],
)
print(response.choices[0].message.content)
import Octomil
let client = OctomilClient(apiKey: "edg_...", orgId: "your-org-id")
try await client.register()
// Download the trained model (cached locally for offline use)
let model = try await client.downloadModel(modelId: "radiology-v1")
// Run inference — entirely on-device
let prediction = try model.predict(input: ["features": scanFeatures])
print("Diagnosis: \(prediction["label"]!)")
The server automatically serves the correct model version based on active rollouts. Subsequent calls use the local cache until a new version is available.
import ai.octomil.OctomilClient
val client = OctomilClient(apiKey = "edg_...", orgId = "your-org-id", context = this)
client.register()
// Download the trained model (cached locally for offline use)
val model = client.downloadModel(modelId = "radiology-v1")
// Run inference — entirely on-device
val prediction = model.predict(mapOf("features" to scanFeatures))
println("Diagnosis: ${prediction["label"]}")
The server automatically serves the correct model version based on active rollouts. Subsequent calls use the local cache until a new version is available.
Rules
- Raw data never leaves the device
- Only active federation members can submit updates
- Each organization controls its own differential privacy budget:
octomil team set-policy --privacy-budget <epsilon> - Federation-scoped aggregation: updates from member devices only
- Per-organization contribution tracked per round
Deploy the trained model
When training completes, the server saves aggregated weights as a new version:
Training complete: radiology-v1
New version: 2.0.0 (aggregated from 50 rounds, 2 organizations)
Each organization deploys independently:
octomil deploy radiology-v1 --version 2.0.0 --rollout 10%
# validate metrics
octomil deploy radiology-v1 --version 2.0.0 --rollout 100%
Consider an A/B experiment comparing v1.0.0 against v2.0.0 before full rollout.
If auto-rollback is enabled, deployments exceeding the error threshold roll back automatically:
octomil rollback radiology-v1 --to-version 1.0.0
Related
- Federations — create and manage cross-org federations
- iOS SDK — full iOS SDK reference
- Android SDK — full Android SDK reference
- Python SDK — full Python SDK reference
- Rollouts — canary and progressive deployment
- Experiments — A/B testing
- Privacy guide — differential privacy budgets