Skip to main content

Request

POST https://ninjachat.ai/api/v1/compare
Authorization: Bearer nj_sk_YOUR_API_KEY
Content-Type: application/json
{
  "messages": [
    {"role": "user", "content": "Explain quantum entanglement in one paragraph"}
  ],
  "models": ["gpt-5", "claude-sonnet-4.6", "gemini-3.1-pro", "deepseek-v3"],
  "rank_by": "balanced"
}

Response

{
  "winner": {
    "model": "claude-sonnet-4.6",
    "name": "Claude Sonnet 4.6",
    "reason": "Best balance of quality (0.97), speed (1180ms), and cost ($0.015)"
  },
  "results": [
    {
      "rank": 1,
      "model": "claude-sonnet-4.6",
      "content": "Quantum entanglement is a phenomenon where two particles become correlated...",
      "quality": { "confidence": 0.97, "flags": [], "suggested_retry": false },
      "latency_ms": 1180,
      "cost_cents": 1.5,
      "tokens": { "prompt": 18, "completion": 87, "total": 105 },
      "success": true
    },
    {
      "rank": 2,
      "model": "gpt-5",
      "content": "When two particles become quantum entangled...",
      "quality": { "confidence": 0.95, "flags": [], "suggested_retry": false },
      "latency_ms": 920,
      "cost_cents": 0.6,
      "tokens": { "prompt": 18, "completion": 74, "total": 92 },
      "success": true
    },
    {
      "rank": 3,
      "model": "gemini-3.1-pro",
      "content": "Quantum entanglement describes a special connection between particles...",
      "quality": { "confidence": 0.93, "flags": [], "suggested_retry": false },
      "latency_ms": 1050,
      "cost_cents": 0.6,
      "tokens": { "prompt": 18, "completion": 82, "total": 100 },
      "success": true
    },
    {
      "rank": 4,
      "model": "deepseek-v3",
      "content": "Quantum entanglement is a quantum mechanical phenomenon...",
      "quality": { "confidence": 0.89, "flags": [], "suggested_retry": false },
      "latency_ms": 780,
      "cost_cents": 0.3,
      "tokens": { "prompt": 18, "completion": 69, "total": 87 },
      "success": true
    }
  ],
  "failed": [],
  "summary": {
    "fastest": { "model": "deepseek-v3", "latency_ms": 780 },
    "highest_quality": { "model": "claude-sonnet-4.6", "confidence": 0.97 },
    "cheapest": { "model": "deepseek-v3", "cost_cents": 0.3 },
    "best_value": { "model": "gpt-5" }
  },
  "ranked_by": "balanced",
  "models_compared": 4,
  "succeeded": 4,
  "total_cost_cents": 3.0,
  "total_cost": "$0.030",
  "balance": "$4.790",
  "compared_at": "2026-03-10T12:00:00.000Z"
}

Parameters

ParameterTypeRequiredDefaultDescription
messagesarrayYesSame format as /chat. 1–20 messages.
modelsarrayNoTop 5 across tiersWhich models to run. 2–8 models. Cannot include auto* or ensemble*.
rank_bystringNo"balanced"How to rank results: quality, speed, cost, or balanced.
max_tokensintegerNo1024Max response length per model (1–8,192).
temperaturenumberNo0.7Sampling temperature.
include_full_responsesbooleanNotrueInclude full response text in results. Set to false to get 200-char previews.

rank_by modes

ModeWeights
balanced50% quality + 30% speed + 20% cost
quality100% quality score
speed100% speed (lowest latency wins)
cost100% cost (cheapest wins)

Default models (when models not specified)

gpt-5, claude-sonnet-4.6, gemini-3.1-pro, deepseek-v3, gemini-3-flash

Billing

You are charged for every successful model call. A 4-model compare costs the sum of each model’s per-request rate. The response includes total_cost and a per-model cost_cents breakdown. If any model fails, you are not charged for that model. All successful models are charged.
Pre-flight balance check: if your balance is less than the estimated total cost, the request fails before any models run.

Code examples

import requests, os

r = requests.post("https://ninjachat.ai/api/v1/compare",
    headers={"Authorization": f"Bearer {os.environ['NINJACHAT_API_KEY']}"},
    json={
        "messages": [{"role": "user", "content": "Write a haiku about machine learning"}],
        "models": ["gpt-5", "claude-sonnet-4.6", "gemini-3.1-pro"],
        "rank_by": "quality",
    }
)
data = r.json()
winner = data["winner"]
print(f"Winner: {winner['model']}{winner['reason']}")
for result in data["results"]:
    print(f"  #{result['rank']} {result['model']}: quality={result['quality']['confidence']:.2f}, {result['latency_ms']}ms, ${result['cost_cents']/100:.4f}")

Common use cases

Choose a model for production — Compare 4–5 models on a representative sample of your actual prompts before committing to one. Verify quality across models — Run the same benchmark prompt monthly to see if model updates changed behavior. Find the best valuesummary.best_value shows the model with the highest quality-to-cost ratio. Regression testing — Run your golden test prompts against a new model before switching.