Last refresh: 2026-05-17 · 17 models graded 3 OPEN RUBRIC ITEMS · DeepSeek judge coverage: 98%

← Leaderboard

GLM 5.1

Weighted composite
3.83
Recommendation
Practitioner-grade
Cohort
large open weight

Scorecard

Per-criterion scores not available for this model. It appears in the cross-part overview but is not part of the canonical Models 1–10 hand-graded set; only the headline composite is shown.

Per-part composites

PartOpus 4.7 (inline)DeepSeek V4 Pro (judge)
Part A 3.83 2.90
Part B 3.85 2.05
Part C 4.00 4.30

Notes from the evaluation

GLM 5.1 | large open-weight | 4.00 | | Gemma 4 31B | small open-weight | 3.95 | | Mistral Small 2603 | small open-weight | 3.70 | | Qwen 3.5 397B | large open-weight | 3.65 | | Gemma 4 26B A4B | small open-weight | 3.50 | | Mistral Large 2512 | large open-weight | 3.45 | | Nemotron Nano 9B v2 | small open-weight | 2.15 | | Qwen 3.5 9B | small open-weight | 1.80 |

Source files in the repo

Cross-judge composites: analysis/results_overview.md

Full report · PDF

Get the full report

All 111 questions, the complete rubric, per-model verdicts, and the methodology paper. Delivered as PDF within 60 seconds. CC-BY 4.0.

Sent from noreply@verdatir.com. We store your address to deliver the report and, if you opt in, future updates. See /privacy. CC-BY 4.0.
Welcome back. You've already requested the full report.

Download the report (PDF) ↓