Open Weight Bench

Vision

OCR for vision-capable models: four sub-tasks — handwritten meeting notes in three difficulty tiers (easy / medium / hard) plus an old book page set in Fraktur typeface.

Task & test logic in detail
Task: Four OCR sub-tasks, one image each. (1)–(3) Handwritten meeting notes in three difficulty tiers (easy / medium / hard) — the model must transcribe the text. (4) An old book page in Fraktur typeface — same task. What is tested: OCR quality, recognising layout structure (columns, bullet points, dates), handling of illegible handwriting and historical letterforms (long-s, ligatures). Why models fail: text-only models have no vision capability (filtered out). Weak VLMs only recognise the clearest part. Some truncate output or get stuck in reasoning without producing a visible answer.
Prompt
System prompt
Du bist OCR-Spezialist für deutsche Handschrift.
Developer prompt
Auf dem Bild siehst du eine handschriftliche Meeting-Notiz mit klarer Struktur und gut lesbarer Schrift. Transkribiere den gesamten lesbaren Text. Behalte die Anordnung bei (Überschrift, Spalten, To-Dos). Bei unleserlichen Stellen schreibe '[unleserlich]'. Gib ausschließlich den puren OCR-Text im Markdown-Format zurück — keine Vorbemerkung, keine Erklärungen, kein Code-Fence.

Wall-time vs. quality

X = wall-time for this bench · Y = score (0–100 %) in this bench. Optimum is top-left — fast and good. RAM estimate for 64k context: 4 GB system + model weights + max(2 GB, 40% of weights) for KV cache.

Colour = vendor · Number = total parameters (B) dense MoE

0% 25% 50% 75% 100% 0s 180s 360s 540s 720s Wall-time (s) → Score 27 35 122 9 9 2 31 14 8 9 27 8 8 35 12 35 26 4 9 26 4 4 5 5 30 30
Models in this bench
26 visible
  1. 1. qwen3.6-27b gguf 4bit 98% · 281s · 22 t/s · 27 GB
  2. 2. qwen3.6-35b-a3b gguf 8bit 97% · 116s · 68 t/s · 53 GB
  3. 3. qwen3.5-122b-a10b gguf 4bit 97% · 715s · 10 t/s · 102 GB
  4. 4. qwen3.5-9b gguf 8bit 97% · 292s · 44 t/s · 18 GB
  5. 5. qwen3.5-9b gguf 4bit 96% · 315s · 59 t/s · 13 GB
  6. 6. qwen3.5-2b gguf 4bit 93% · 15s · 162 t/s · 8 GB
  7. 7. gemma-4-31b gguf 4bit 93% · 370s · 21 t/s · 30 GB
  8. 8. ministral-3-14b-reasoning gguf 4bit 92% · 46s · 47 t/s · 16 GB
  9. 9. qwen3-vl-8b mlx 4bit 92% · 37s · 79 t/s · 12 GB
  10. 10. glm-4.6v-flash mlx 4bit 91% · 82s · 64 t/s · 13 GB
  11. 11. gemma-3-27b mlx 4bit 86% · 68s · 28 t/s · 26 GB
  12. 12. gemma-4-e4b gguf 8bit 85% · 79s · 67 t/s · 16 GB
  13. 13. gemma-4-e4b gguf 4bit 83% · 58s · 87 t/s · 12 GB
  14. 14. qwen3.5-35b-a3b gguf 4bit 78% · 122s · 79 t/s · 33 GB
  15. 15. gemma-3-12b mlx 4bit 77% · 31s · 56 t/s · 15 GB
  16. 16. qwen3.6-35b-a3b gguf 4bit 75% · 139s · 82 t/s · 33 GB
  17. 17. gemma-4-26b-a4b gguf 8bit 74% · 234s · 72 t/s · 41 GB
  18. 18. gemma-3n-e4b mlx 4bit 72% · 17s · 80 t/s · 12 GB
  19. 19. qwen3.5-9b-mlx mlx 4bit 71% · 179s · 85 t/s · 12 GB
  20. 20. gemma-4-26b-a4b gguf 4bit 71% · 197s · 89 t/s · 27 GB
  21. 21. qwen3.5-4b gguf 4bit 70% · 176s · 85 t/s · 9 GB
  22. 22. gemma-3-4b mlx 4bit 65% · 12s · 141 t/s · 9 GB
  23. 23. gemma-4-e2b gguf 8bit 63% · 34s · 111 t/s · 12 GB
  24. 24. gemma-4-e2b gguf 4bit 55% · 20s · 138 t/s · 10 GB
  25. 25. nemotron-3-nano-omni gguf 8bit 28% · 192s · 77 t/s · 50 GB
  26. 26. nemotron-3-nano-omni gguf 4bit 27% · 156s · 86 t/s · 38 GB
Model Vendor Quant Ctx Released RAM tok/s Tokens Wall Score

Click a row to open the model detail page. Hover shows available render previews. Column headers are sortable.