Skip to main content

Embeddings

Generate vector embeddings from text for semantic search, RAG pipelines, clustering, and similarity comparisons. Octomil proxies embedding requests to your LLM backend and provides SDK wrappers for every platform.

API Endpoint

POST /api/v1/embeddings
curl -X POST http://localhost:8000/api/v1/embeddings \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model_id": "nomic-embed-text",
"input": ["search query", "document text to embed"]
}'

Response:

{
"embeddings": [[0.012, -0.034, ...], [0.056, 0.078, ...]],
"model": "nomic-embed-text",
"provider": "ollama",
"usage": { "prompt_tokens": 12, "total_tokens": 12 }
}

Both single strings and arrays are supported as input.

SDK Examples

import octomil

client = octomil.Client(api_key="oct_...")

# Single string
result = client.embed("nomic-embed-text", "search query")
print(result.embeddings[0][:5]) # first 5 dimensions

# Batch
result = client.embed("nomic-embed-text", ["doc 1", "doc 2", "doc 3"])
print(len(result.embeddings)) # 3
print(result.usage) # {'prompt_tokens': ..., 'total_tokens': ...}

Use Cases

  • Semantic search: Embed queries and documents, find nearest neighbors
  • RAG: Retrieve relevant context before generating responses
  • Clustering: Group similar content by embedding distance
  • Deduplication: Find near-duplicate content via cosine similarity
  • Classification: Use embeddings as features for downstream classifiers

Models

Any embedding model supported by your backend works. Common choices:

ModelDimensionsUse Case
nomic-embed-text768General-purpose text
all-minilm384Lightweight, fast
mxbai-embed-large1024High-quality retrieval