LLM Gateway
One governed path to every frontier model. Use Claude, GPT, Gemini, Grok, and Sonar through a black-box access layer with routing, policy controls, browser-safe sessions, replay-safe execution, and cost visibility.
Supported Providers
Every request is routed through a unified interface. Add or swap providers without changing a single line of application code.
Anthropic
Claude Opus 4.6, Sonnet 4.6, Haiku 4.5
OpenAI
GPT-5.4, GPT-4o, o3, o3-mini
Gemini 2.5 Pro, 2.5 Flash, 2.0 Flash
xAI
Grok-4, Grok-4.1 Fast, Grok-3
Perplexity
Sonar Pro, Sonar Reasoning Pro
How It Works
LLM Gateway supports two deployment modes. Use it embedded inside Magic Runtime, or integrate directly for standalone multi-provider inference.
Embedded in Magic
Your app talks to Magic Runtime, and Magic uses LLM Gateway behind the scenes for routing, policy enforcement, cost tracking, and failure handling.
Direct Gateway Mode
Integrate llmgateway.threadsync.io directly for standalone multi-provider AI access with org-scoped governance, browser-safe auth, and usage controls.
Features
Auto-Routing
Heuristic + LLM-based routing picks the optimal provider for each request. Keyword analysis for speed, or ask the router model to decide.
Rate Limiting & Quotas
Atomic slot reservation with advisory locks. Hourly sliding windows, daily and monthly budgets per API key.
Cost Tracking
Per-request token counts and USD cost computed from the model catalog. Full usage history in the database, exportable via admin API.
Idempotent Requests
Send an Idempotency-Key header and the gateway guarantees exactly-once execution with 30-day response replay.
API Key Scoping
Per-key provider and model allowlists, tier-based rate limits, and organization-level policy overrides.
Conversation Memory
Optional conversation_id parameter maintains context across messages with automatic summarization.
Browser-Safe Subject Sessions
PKCE code exchange, signed proof-of-possession requests, app-client origin binding. Direct browser auth disabled in production. Session material in memory, never localStorage.
Org Policy & Admin Plane
Organization-scoped API keys, per-key provider and model allowlists, app-client approval, audit logging, usage dashboards, and superadmin health surfaces.
Conversation Store & Org Knowledge
Private-by-default prompt/response capture. Users share, unshare, or delete conversations. Org-visible browsing and admin export for knowledge ingestion. Agent definitions and pre-call RAG coming next.
API Reference
Two integration paths: proxy through Magic Runtime, or call the gateway directly. Both get routing, governance, and cost tracking.
curl -X POST https://magic.threadsync.io/api/v1/llm/chat \ -H "X-API-Key: your-api-key" \ -H "Content-Type: application/json" \ -d '{ "messages": [{"role": "user", "content": "Summarize our Q3 board decisions"}], "system": "You are an executive assistant.", "auto_route": true }'
{
"content": "Here is a summary of Q3 board decisions...",
"model": "claude-sonnet-4-6-20260312",
"provider": "anthropic",
"route_method": "auto_router",
"usage": {
"tokens_in": 42,
"tokens_out": 256,
"cost_usd": 0.0048
}
}
import httpx client = httpx.Client( base_url="https://magic.threadsync.io/api/v1/llm", headers={"X-API-Key": "your-api-key"} ) # Auto-route to the best provider resp = client.post("/chat", json={ "messages": [{"role": "user", "content": "Analyze this contract"}], "auto_route": True }) result = resp.json() print(result["content"]) # Provider chosen: anthropic (claude-sonnet-4-6) # Or pick a specific model resp = client.post("/chat", json={ "messages": [{"role": "user", "content": "Search for recent case law on..."}], "model": "sonar-reasoning-pro" })
curl -X POST https://llmgateway.threadsync.io/v1/chat/completions \ -H "X-API-Key: YOUR_GATEWAY_KEY" \ -H "Content-Type: application/json" \ -H "Idempotency-Key: req-123" \ -d '{ "messages": [{"role": "user", "content": "Summarize our Q3 board decisions"}], "auto_route": true, "conversation_id": "board-q3" }'
// Backend mints a share code + code_verifier const session = await exchangeForSession(share_code, code_verifier); // Session material stays in memory — never localStorage const result = await callCompletion( session.subject_token, session.hmac_key, [{ role: "user", content: "What changed in our pricing memo?" }] ); // Browser integrations use subject-session exchange // with PKCE and signed requests — not raw API keys.
Security
SHA-256 Key Hashing
API keys are hashed before storage. Raw keys never persist in the database. Per-key quotas and provider restrictions.
Browser PKCE
Browser clients use share-code exchange with PKCE challenges and signed session nonces for proof-of-possession.
Container Hardening
Read-only filesystem, no new privileges, all capabilities dropped, PID limit 256, 512MB memory cap.
Production Browser Controls
App-client origin binding, PKCE challenge exchange, signed proof-of-possession requests. Direct browser auth disabled in production. Session material in memory only, never localStorage.
Audit & Compliance
Every request logged with correlation IDs. Admin audit trail for key lifecycle, provider config changes, and policy overrides. Superadmin-only health and metrics surfaces.
Error Sanitization
Internal errors logged in full; clients receive sanitized 502 responses. No credential or URL leakage. Uncertain provider states quarantined and surfaced safely.
Ready to Add AI to Your Stack?
Deploy Magic Runtime with LLM Gateway in minutes. One API key, every frontier model, full cost visibility.