Claude Opus 4.7

Weighted composite

4.95

Recommendation

Verifier-grade

OpenRouter ID

anthropic/claude-opus-4.7

Cohort

latest frontier

Scorecard

Per-criterion scores not available for this model. It appears in the cross-part overview but is not part of the canonical Models 1–10 hand-graded set; only the headline composite is shown.

Per-part composites

Part	Opus 4.7 (inline)	DeepSeek V4 Pro (judge)
Part A	4.95	3.75
Part B	4.95sampled	3.75
Part C	5.00	4.40

Notes from the evaluation

Claude Opus 4.7 against `rubric/answer_key_partA_Q1-Q50.md`. Real model identities are used here only because the runs are mid-evaluation; once these enter the canonical `evaluations/evaluation_model_N.md` set, they will be re-anonymised per repo convention.

Source files in the repo

Cross-judge composites: analysis/results_overview.md
Raw responses: search raw_responses/Part_*_claude_opus_4_7_* in the repo

Full report · PDF

Get the full report

All 111 questions, the complete rubric, per-model verdicts, and the methodology paper. Delivered as PDF within 60 seconds. CC-BY 4.0.

Welcome back. You've already requested the full report.

Download the report (PDF) ↓