Request
Response
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | No | ninja-1 | Model ID, smart routing variant (auto, auto-fast, etc.), ensemble variant, or fallback chain (a>b>c). |
messages | array | Yes | — | 1–50 messages. Each has role (system, user, or assistant) and content. |
temperature | number | No | 0.7 | 0 = deterministic, 2 = creative. |
max_tokens | integer | No | 2048 | Max response length (1–16,384). |
stream | boolean | No | false | Stream tokens via SSE. See Streaming. |
session_id | string | No | — | Attach to a persistent session. History is auto-injected. |
cache | boolean | No | true | Cache responses. Identical requests return instantly at no cost. Disabled when using session_id or stream. |
Multi-turn conversations
Without sessions, pass the full conversation history manually:More features
The chat endpoint supports several additional capabilities — each documented on its own page:- Smart Routing — auto-select the best model per request
- Fallback Chains — try models in sequence
- Ensemble — consensus-driven answers from multiple models
- Budget Routing — best model within a cost ceiling
- Quality Scoring — confidence scores with auto-retry
- Streaming — token-by-token SSE output
- Sessions — persistent conversation memory