Claude Opus 4.7
Weighted composite
4.95
Recommendation
Verifier-grade
OpenRouter ID
anthropic/claude-opus-4.7
Cohort
latest frontier
Scorecard
Per-criterion scores not available for this model. It appears in the cross-part overview but is not part of the canonical Models 1–10 hand-graded set; only the headline composite is shown.
Per-part composites
| Part | Opus 4.7 (inline) | DeepSeek V4 Pro (judge) |
|---|---|---|
| Part A | 4.95 | 3.75 |
| Part B | 4.95sampled | 3.75 |
| Part C | 5.00 | 4.40 |
Notes from the evaluation
Claude Opus 4.7 against `rubric/answer_key_partA_Q1-Q50.md`. Real model identities are used here only because the runs are mid-evaluation; once these enter the canonical `evaluations/evaluation_model_N.md` set, they will be re-anonymised per repo convention.
Source files in the repo
Cross-judge composites: analysis/results_overview.md
Raw responses: search raw_responses/Part_*_claude_opus_4_7_* in the repo
Full report · PDF
Get the full report
All 111 questions, the complete rubric, per-model verdicts, and the methodology paper. Delivered as PDF within 60 seconds. CC-BY 4.0.
Welcome back.
You've already requested the full report.