Anthropic Haiku 4.5
Weighted composite
3.50
Recommendation
Practitioner-grade
Cohort
legacy frontier
Scorecard
Per-criterion scores not available for this model. It appears in the cross-part overview but is not part of the canonical Models 1–10 hand-graded set; only the headline composite is shown.
Per-part composites
| Part | Opus 4.7 (inline) | DeepSeek V4 Pro (judge) |
|---|---|---|
| Part A | 3.50rep-sample → 3.55 full-pass⁴ | 2.95 |
| Part B | 4.10sampled | 2.25 |
| Part C | 4.10(`_or`)² | 3.75 |
Notes from the evaluation
Anthropic Haiku 4.5 and OpenAI GPT 5.4 thinking (only these two have raw Part B files); - small-open-weight × Part B for all five models (Gemma 4 31B, Gemma 4 26B A4B, Mistral Small 2603, Qwen 3.5 9B, Nemotron Nano 9B v2).
Source files in the repo
Cross-judge composites: analysis/results_overview.md
Full report · PDF
Get the full report
All 111 questions, the complete rubric, per-model verdicts, and the methodology paper. Delivered as PDF within 60 seconds. CC-BY 4.0.
Welcome back.
You've already requested the full report.