Smart Routing - NinjaChat API

Request

curl -X POST https://ninjachat.ai/api/v1/chat \
  -H "Authorization: Bearer nj_sk_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Solve: x² + 5x + 6 = 0"}],
    "include_routing": true
  }'

Response

{
  "model": "o3-mini",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Factoring: x² + 5x + 6 = (x + 2)(x + 3) = 0\nSolutions: x = -2 and x = -3"
    },
    "finish_reason": "stop"
  }],
  "routing": {
    "requested": "auto",
    "resolved": "o3-mini",
    "task_type": "math",
    "reasoning": "Detected task type: math"
  },
  "cost": {"this_request": "$0.006"}
}

Add "include_routing": true to see which model was chosen and why.

The four variants

Model ID	Optimizes for	Best when…
`auto`	Quality + speed balance	You want the best model without thinking about it
`auto-fast`	Lowest latency	Real-time apps, chatbots, low-latency pipelines
`auto-cheap`	Lowest cost	High-volume jobs, batch processing, cost-sensitive apps
`auto-quality`	Highest quality	Critical decisions, best possible output

How task detection works

NinjaChat analyzes your last 3 user messages to detect the task type:

Task type	Detected keywords	`auto` routes to
`code`	function, debug, implement, algorithm, TypeScript, SQL…	`claude-sonnet-4.6`
`math`	equation, solve, calculate, integral, probability…	`o3-mini`
`creative`	write, story, poem, imagine, fiction, lyrics…	`gemini-3.1-pro`
`analysis`	analyze, compare, evaluate, research, summarize…	`gpt-5`
`quick`	Short prompts under 80 chars, “what is”, “define”…	`gemini-3-flash`
`general`	Everything else	`gpt-5`

Full routing table

auto (balanced)
auto-fast
auto-cheap
auto-quality

Task	Model
code	`claude-sonnet-4.6`
math	`o3-mini`
creative	`gemini-3.1-pro`
analysis	`gpt-5`
quick	`gemini-3-flash`
general	`gpt-5`

Task	Model
code	`claude-haiku-4.5`
math	`gpt-5-mini`
creative	`gemini-3-flash`
analysis	`gpt-5-mini`
quick	`gpt-5-mini`
general	`gpt-5-mini`

Task	Model
code	`deepseek-v3`
math	`qwq-32b`
creative	`gemini-3-flash`
analysis	`deepseek-v3`
quick	`gemini-2.5-flash`
general	`llama-4-maverick`

Task	Model
code	`claude-opus-4.6`
math	`o3-mini`
creative	`gemini-3.1-pro`
analysis	`claude-opus-4.6`
quick	`claude-opus-4.6`
general	`claude-opus-4.6`

Billing

Auto variants are billed at the resolved model’s rate. If auto routes to o3-mini, you pay

0.006. If it routes to `claude-sonnet-4.6`, you pay

0.015. The routing field always shows the cost-incurring model.

Parameters

Parameter	Type	Default	Description
`include_routing`	boolean	`false`	Include `routing` object in response.
`budget_cents`	number	—	Override with a cost ceiling. See Budget Routing.

Code examples

import requests, os

r = requests.post("https://ninjachat.ai/api/v1/chat",
    headers={"Authorization": f"Bearer {os.environ['NINJACHAT_API_KEY']}"},
    json={
        "model": "auto",
        "messages": [{"role": "user", "content": "Write a merge sort in Python"}],
        "include_routing": True,
    }
)
data = r.json()
print(data["choices"][0]["message"]["content"])
print("Routed to:", data["routing"]["resolved"])  # claude-sonnet-4.6
print("Task type:", data["routing"]["task_type"]) # code

Manual model selection

If you prefer explicit control over which model runs, here’s a quick reference:

models = {
    "classify": "deepseek-v3",       # $0.003 — simple tasks
    "chat":     "gpt-5",             # $0.006 — general use
    "code":     "claude-sonnet-4.6",  # $0.015 — when quality matters
    "fast":     "gemini-2.5-flash",   # $0.003 — lowest latency
}

See Models for the full list with pricing and recommendations.

​Request

​Response

​The four variants

​How task detection works

​Full routing table

​Billing

​Parameters

​Code examples

​Manual model selection