Anthropic Haiku 4.5

Weighted composite

3.50

Recommendation

Practitioner-grade

Cohort

legacy frontier

Scorecard

Per-criterion scores not available for this model. It appears in the cross-part overview but is not part of the canonical Models 1–10 hand-graded set; only the headline composite is shown.

Per-part composites

Part	Opus 4.7 (inline)	DeepSeek V4 Pro (judge)
Part A	3.50rep-sample → 3.55 full-pass⁴	2.95
Part B	4.10sampled	2.25
Part C	4.10(`_or`)²	3.75

Notes from the evaluation

Anthropic Haiku 4.5 and OpenAI GPT 5.4 thinking (only these two have raw Part B files); - small-open-weight × Part B for all five models (Gemma 4 31B, Gemma 4 26B A4B, Mistral Small 2603, Qwen 3.5 9B, Nemotron Nano 9B v2).

Source files in the repo

Cross-judge composites: analysis/results_overview.md

Full report · PDF

Get the full report

All 111 questions, the complete rubric, per-model verdicts, and the methodology paper. Delivered as PDF within 60 seconds. CC-BY 4.0.

Welcome back. You've already requested the full report.

Download the report (PDF) ↓