Skip to content
Governed AI Access Layer

LLM Gateway

One governed path to every frontier model. Use Claude, GPT, Gemini, Grok, and Sonar through a black-box access layer with routing, policy controls, browser-safe sessions, replay-safe execution, and cost visibility.

5
Providers
43+
Models
Auto
Routing
PKCE
Browser Sessions

Supported Providers

Every request is routed through a unified interface. Add or swap providers without changing a single line of application code.

Anthropic

Anthropic

Claude Opus 4.6, Sonnet 4.6, Haiku 4.5

OpenAI

OpenAI

GPT-5.4, GPT-4o, o3, o3-mini

Google Gemini

Google

Gemini 2.5 Pro, 2.5 Flash, 2.0 Flash

xAI

xAI

Grok-4, Grok-4.1 Fast, Grok-3

Perplexity

Perplexity

Sonar Pro, Sonar Reasoning Pro

How It Works

LLM Gateway supports two deployment modes. Use it embedded inside Magic Runtime, or integrate directly for standalone multi-provider inference.

Your App Frontend / API client
Magic Runtime Controllers + Governance
LLM Gateway Auto-route + Rate limit
Best Provider Claude / GPT / Gemini / ...

Embedded in Magic

Your app talks to Magic Runtime, and Magic uses LLM Gateway behind the scenes for routing, policy enforcement, cost tracking, and failure handling.

Direct Gateway Mode

Integrate llmgateway.threadsync.io directly for standalone multi-provider AI access with org-scoped governance, browser-safe auth, and usage controls.

Features

Auto-Routing

Heuristic + LLM-based routing picks the optimal provider for each request. Keyword analysis for speed, or ask the router model to decide.

Rate Limiting & Quotas

Atomic slot reservation with advisory locks. Hourly sliding windows, daily and monthly budgets per API key.

Cost Tracking

Per-request token counts and USD cost computed from the model catalog. Full usage history in the database, exportable via admin API.

Idempotent Requests

Send an Idempotency-Key header and the gateway guarantees exactly-once execution with 30-day response replay.

API Key Scoping

Per-key provider and model allowlists, tier-based rate limits, and organization-level policy overrides.

Conversation Memory

Optional conversation_id parameter maintains context across messages with automatic summarization.

Browser-Safe Subject Sessions

PKCE code exchange, signed proof-of-possession requests, app-client origin binding. Direct browser auth disabled in production. Session material in memory, never localStorage.

Org Policy & Admin Plane

Organization-scoped API keys, per-key provider and model allowlists, app-client approval, audit logging, usage dashboards, and superadmin health surfaces.

Conversation Store & Org Knowledge

Private-by-default prompt/response capture. Users share, unshare, or delete conversations. Org-visible browsing and admin export for knowledge ingestion. Agent definitions and pre-call RAG coming next.

API Reference

Two integration paths: proxy through Magic Runtime, or call the gateway directly. Both get routing, governance, and cost tracking.

POST /v1/chat/completions curl
curl -X POST https://magic.threadsync.io/api/v1/llm/chat \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Summarize our Q3 board decisions"}],
    "system": "You are an executive assistant.",
    "auto_route": true
  }'
Response JSON
{
  "content": "Here is a summary of Q3 board decisions...",
  "model": "claude-sonnet-4-6-20260312",
  "provider": "anthropic",
  "route_method": "auto_router",
  "usage": {
    "tokens_in": 42,
    "tokens_out": 256,
    "cost_usd": 0.0048
  }
}
Python SDK Python
import httpx

client = httpx.Client(
    base_url="https://magic.threadsync.io/api/v1/llm",
    headers={"X-API-Key": "your-api-key"}
)

# Auto-route to the best provider
resp = client.post("/chat", json={
    "messages": [{"role": "user", "content": "Analyze this contract"}],
    "auto_route": True
})
result = resp.json()
print(result["content"])
# Provider chosen: anthropic (claude-sonnet-4-6)

# Or pick a specific model
resp = client.post("/chat", json={
    "messages": [{"role": "user", "content": "Search for recent case law on..."}],
    "model": "sonar-reasoning-pro"
})
Direct Gateway Mode curl
curl -X POST https://llmgateway.threadsync.io/v1/chat/completions \
  -H "X-API-Key: YOUR_GATEWAY_KEY" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: req-123" \
  -d '{
    "messages": [{"role": "user", "content": "Summarize our Q3 board decisions"}],
    "auto_route": true,
    "conversation_id": "board-q3"
  }'
Browser-Safe Session Flow JavaScript
// Backend mints a share code + code_verifier
const session = await exchangeForSession(share_code, code_verifier);

// Session material stays in memory — never localStorage
const result = await callCompletion(
  session.subject_token,
  session.hmac_key,
  [{ role: "user", content: "What changed in our pricing memo?" }]
);

// Browser integrations use subject-session exchange
// with PKCE and signed requests — not raw API keys.

Security

SHA-256 Key Hashing

API keys are hashed before storage. Raw keys never persist in the database. Per-key quotas and provider restrictions.

Browser PKCE

Browser clients use share-code exchange with PKCE challenges and signed session nonces for proof-of-possession.

Container Hardening

Read-only filesystem, no new privileges, all capabilities dropped, PID limit 256, 512MB memory cap.

Production Browser Controls

App-client origin binding, PKCE challenge exchange, signed proof-of-possession requests. Direct browser auth disabled in production. Session material in memory only, never localStorage.

Audit & Compliance

Every request logged with correlation IDs. Admin audit trail for key lifecycle, provider config changes, and policy overrides. Superadmin-only health and metrics surfaces.

Error Sanitization

Internal errors logged in full; clients receive sanitized 502 responses. No credential or URL leakage. Uncertain provider states quarantined and surfaced safely.

Ready to Add AI to Your Stack?

Deploy Magic Runtime with LLM Gateway in minutes. One API key, every frontier model, full cost visibility.