Embeddings
Generate vector embeddings from text for semantic search, RAG pipelines, clustering, and similarity comparisons. Octomil proxies embedding requests to your LLM backend and provides SDK wrappers for every platform.
API Endpoint
POST /api/v1/embeddings
curl -X POST http://localhost:8000/api/v1/embeddings \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model_id": "nomic-embed-text",
"input": ["search query", "document text to embed"]
}'
Response:
{
"embeddings": [[0.012, -0.034, ...], [0.056, 0.078, ...]],
"model": "nomic-embed-text",
"provider": "ollama",
"usage": { "prompt_tokens": 12, "total_tokens": 12 }
}
Both single strings and arrays are supported as input.
SDK Examples
- Python
- iOS (Swift)
- Android (Kotlin)
- Browser
import octomil
client = octomil.Client(api_key="oct_...")
# Single string
result = client.embed("nomic-embed-text", "search query")
print(result.embeddings[0][:5]) # first 5 dimensions
# Batch
result = client.embed("nomic-embed-text", ["doc 1", "doc 2", "doc 3"])
print(len(result.embeddings)) # 3
print(result.usage) # {'prompt_tokens': ..., 'total_tokens': ...}
let client = EmbeddingClient(serverURL: serverURL, apiKey: "oct_...")
// Single string
let result = try await client.embed(modelId: "nomic-embed-text", input: "search query")
print(result.embeddings[0].prefix(5))
// Batch
let batchResult = try await client.embed(
modelId: "nomic-embed-text",
input: ["doc 1", "doc 2", "doc 3"]
)
print(batchResult.embeddings.count) // 3
val client = EmbeddingClient(serverUrl = "http://...", apiKey = "oct_...")
// Single string
val result = client.embed("nomic-embed-text", "search query")
println(result.embeddings[0].take(5))
// Batch
val batchResult = client.embed("nomic-embed-text", listOf("doc 1", "doc 2"))
println(batchResult.embeddings.size) // 2
import { embed } from '@octomil/browser';
// Single string
const result = await embed('http://localhost:8000', 'oct_...', 'nomic-embed-text', 'search query');
console.log(result.embeddings[0].slice(0, 5));
// Batch
const batch = await embed('http://localhost:8000', 'oct_...', 'nomic-embed-text', [
'doc 1', 'doc 2', 'doc 3'
]);
console.log(batch.embeddings.length); // 3
Use Cases
- Semantic search: Embed queries and documents, find nearest neighbors
- RAG: Retrieve relevant context before generating responses
- Clustering: Group similar content by embedding distance
- Deduplication: Find near-duplicate content via cosine similarity
- Classification: Use embeddings as features for downstream classifiers
Models
Any embedding model supported by your backend works. Common choices:
| Model | Dimensions | Use Case |
|---|---|---|
nomic-embed-text | 768 | General-purpose text |
all-minilm | 384 | Lightweight, fast |
mxbai-embed-large | 1024 | High-quality retrieval |