Coding
Single-shot code generation (Kanban board as an HTML file). Measures how fast and how functionally models solve a concrete UI task. Hovering over a model row shows a screenshot of the rendered app.
Task & test logic in detail
Task: From a ~200-word prompt the model must generate a fully functional Kanban board as a single-file HTML with drag & drop, localStorage persistence, edit/delete and a confetti animation — in a single chat without iteration. The prompt also includes a small `data-testid` contract so a Playwright test can drive the app remotely.
Three signals feed into the score:
(1) Static — a linter checks concrete constraints in the HTML (columns, Tailwind, localStorage call, no framework, no window.alert/prompt, …).
(2) Functional — Playwright runs a small CRUD sequence: create a card, delete a card with confirmation, reload — does state persist? — and checks whether any JS console errors occur during the entire flow. Drag & drop and confetti are deliberately not tested functionally (too many implementation variants).
(3) Qualitative — LLM-as-judge rates screenshot and code (visual + code quality + render↔code consistency).
Score = mean over the available signals.
Why models fail: reasoning models burn their tokens in thinking instead of writing. Sliding-window models (Gemma 4) lose the constraints at the start of the prompt. Small models (<3B) often fail to produce coherent HTML — or ignore the data-testid contract, which makes the functional tests fail in droves.
Prompt
System prompt
You are a careful front-end engineer.
Developer prompt
Create a fully functional Kanban board in a single HTML file using vanilla JavaScript (no frameworks like react). Requirements: - Columns: Backlog, In Progress, Review, Done. - Cards must be: - draggable across columns, - editable in place, - persisted in localStorage (state survives reloads) - please use your own namespace, - deletable with a confirmation prompt. - Each column provides an "Add card" action. - Style with Tailwind via CDN. - Add subtle CSS transitions and trigger a confetti animation when a card moves to "Done". - Thoroughly comment the code. - dont use window.alert or window.prompt to add/edit/delete cards - if there are no cards yet, create some dummy cards - modern and vibrant design Stable test selectors (mandatory — these data-testid attributes are used by an automated functional test; do not omit, rename, or split them across multiple elements): - Column containers: data-testid="column-backlog", data-testid="column-in-progress", data-testid="column-review", data-testid="column-done". - Every "Add card" button (one per column): data-testid="add-card". - Every card element: data-testid="card". - Inside each card, the delete trigger: data-testid="delete-card". - The confirm button of the delete-confirmation dialog/modal: data-testid="confirm-delete". - The input/textarea where a new card title is typed: data-testid="card-input". Pressing Enter in this input MUST commit the new card. As answer return the plain HTML of the working application (script and styles included)
Wall-time vs. quality
X = wall-time for this bench · Y = score (0–100 %) in this bench. Optimum is top-left — fast and good. RAM estimate for 64k context: 4 GB system + model weights + max(2 GB, 40% of weights) for KV cache.
Colour = vendor · Number = total parameters (B) dense MoE
Models in this bench
43 visible
- 1. qwen3.6-35b-a3b gguf 4bit 93% · 160s · 81 t/s · 33 GB
- 2. qwen3.6-27b gguf 4bit 89% · 608s · 22 t/s · 27 GB
- 3. gemma-4-31b gguf 4bit 88% · 353s · 21 t/s · 30 GB
- 4. qwen3.6-35b-a3b gguf 8bit 87% · 203s · 67 t/s · 53 GB
- 5. qwen3-coder-next mlx 4bit 87% · 144s · 72 t/s · 67 GB
- 6. gemma-4-26b-a4b gguf 8bit 86% · 118s · 72 t/s · 41 GB
- 7. glm-4.5-air-mlx mlx 4bit 86% · 372s · 35 t/s · 82 GB
- 8. qwen3.5-27b-claude-4.6-opus-distilled-mlx mlx 4bit 85% · 276s · 25 t/s · 24 GB
- 9. qwen3.5-122b-a10b gguf 4bit 80% · 733s · 4 t/s · 102 GB
- 10. qwen3-coder-30b mlx 4bit 78% · 83s · 95 t/s · 26 GB
- 11. glm-4.7-flash mlx 4bit 78% · 143s · 69 t/s · 28 GB
- 12. gemma-4-e2b gguf 8bit 73% · 68s · 110 t/s · 12 GB
- 13. qwen3.5-9b gguf 8bit 70% · 172s · 44 t/s · 18 GB
- 14. gpt-oss-120b gguf 4bit 69% · 149s · 80 t/s · 87 GB
- 15. gemma-4-e4b gguf 4bit 65% · 91s · 85 t/s · 12 GB
- 16. qwen3.5-35b-a3b gguf 4bit 62% · 113s · 79 t/s · 33 GB
- 17. qwen3.5-9b-mlx mlx 4bit 61% · 113s · 84 t/s · 12 GB
- 18. gemma-4-e2b gguf 4bit 61% · 48s · 135 t/s · 10 GB
- 19. qwen2.5-coder-32b mlx 4bit 60% · 147s · 23 t/s · 28 GB
- 20. qwen3-vl-30b mlx 4bit 60% · 91s · 79 t/s · 28 GB
- 21. qwen3.5-2b gguf 4bit 59% · 67s · 158 t/s · 8 GB
- 22. gpt-oss-20b mlx 4bit 52% · 44s · 109 t/s · 20 GB
- 23. phi-4-reasoning-plus mlx 4bit 52% · 368s · 41 t/s · 15 GB
- 24. gemma-3-12b mlx 4bit 51% · 85s · 56 t/s · 15 GB
- 25. gemma-3-4b mlx 4bit 50% · 30s · 141 t/s · 9 GB
- 26. gemma-4-e4b gguf 8bit 46% · 108s · 66 t/s · 16 GB
- 27. gemma-4-26b-a4b gguf 4bit 46% · 112s · 89 t/s · 27 GB
- 28. qwen3.5-9b gguf 4bit 45% · 129s · 59 t/s · 13 GB
- 29. gemma-3-27b mlx 4bit 42% · 187s · 27 t/s · 26 GB
- 30. qwen3.5-4b gguf 4bit 40% · 88s · 85 t/s · 9 GB
- 31. nemotron-3-nano-omni gguf 8bit 39% · 140s · 78 t/s · 50 GB
- 32. qwen3-8b mlx 4bit 39% · 186s · 73 t/s · 10 GB
- 33. qwen3-4b-2507 mlx 4bit 39% · 32s · 135 t/s · 8 GB
- 34. nemotron-3-nano-omni gguf 4bit 38% · 217s · 83 t/s · 38 GB
- 35. qwen2.5-coder-14b mlx 4bit 36% · 64s · 49 t/s · 15 GB
- 36. nemotron-3-nano mlx 4bit 35% · 69s · 131 t/s · 27 GB
- 37. qwen3-4b-thinking-2507 mlx 4bit 33% · 113s · 113 t/s · 8 GB
- 38. granite-4-h-tiny gguf 4bit 32% · 26s · 116 t/s · 10 GB
- 39. nemotron-3-nano-4b gguf 4bit 30% · 64s · 84 t/s · 9 GB
- 40. lfm2-24b-a2b mlx 4bit 29% · 53s · 135 t/s · 22 GB
- 41. ministral-3-14b-reasoning gguf 4bit 22% · 136s · 46 t/s · 16 GB
- 42. gemma-3n-e4b mlx 4bit 11% · 50s · 79 t/s · 12 GB
- 43. lfm2.5-1.2b mlx 8bit 7% · 21s · 271 t/s · 7 GB
| Model | Vendor | Quant | Ctx | Released | RAM | tok/s | Tokens | Wall | Score |
|---|---|---|---|---|---|---|---|---|---|
| qwen3.6-35b-a3b | qwen | gguf 4bit | 256k | 2026-04-15 | 20.6 GB | 81 | 9964 | 159.9 s | 93% |
| qwen3.6-27b | qwen | gguf 4bit | 256k | 2026-04-21 | 16.3 GB | 22 | 12474 | 607.9 s | 89% |
| gemma-4-31b | gguf 4bit | 256k | 2026-03-12 | 18.5 GB | 21 | 6439 | 352.7 s | 88% | |
| qwen3.6-35b-a3b | qwen | gguf 8bit | 256k | 2026-04-15 | 35.2 GB | 67 | 9460 | 203.0 s | 87% |
| qwen3-coder-next | qwen | mlx 4bit | 256k | 2026-01-30 | 45.3 GB | 72 | 6000 | 144.0 s | 87% |
| gemma-4-26b-a4b | gguf 8bit | 256k | 2026-03-12 | 26.1 GB | 72 | 5134 | 118.0 s | 86% | |
| glm-4.5-air-mlx | lmstudio-community | mlx 4bit | 128k | 2025-07-28 | 56.0 GB | 35 | 10321 | 372.4 s | 86% |
| qwen3.5-27b-claude-4.6-opus-distilled-mlx | mlx-community | mlx 4bit | 256k | 2026-03-04 | 14.1 GB | 25 | 6334 | 275.6 s | 85% |
| qwen3.5-122b-a10b | lmstudio-community | gguf 4bit | 256k | 2026-02-24 | 70.0 GB | 4 | 7742 | 732.9 s | 80% |
| qwen3-coder-30b | qwen | mlx 4bit | 256k | 2025-07-31 | 16.0 GB | 95 | 5402 | 83.2 s | 78% |
| glm-4.7-flash | zai-org | mlx 4bit | 198k | 2026-01-19 | 16.9 GB | 69 | 7746 | 142.8 s | 78% |
| gemma-4-e2b | gguf 8bit | 128k | 2026-03-02 | 5.5 GB | 110 | 6405 | 67.7 s | 73% | |
| qwen3.5-9b | qwen | gguf 8bit | 256k | 2026-02-27 | 9.7 GB | 44 | 6799 | 172.2 s | 70% |
| gpt-oss-120b | openai | gguf 4bit | 128k | 2025-08-04 | 59.0 GB | 80 | 2837 | 149.2 s | 69% |
| gemma-4-e4b | gguf 4bit | 128k | 2026-03-02 | 5.9 GB | 85 | 6864 | 90.9 s | 65% | |
| qwen3.5-35b-a3b | qwen | gguf 4bit | 256k | 2026-02-24 | 20.6 GB | 79 | 5988 | 112.9 s | 62% |
| qwen3.5-9b-mlx | mlx-community | mlx 4bit | 256k | 2026-02-27 | 5.6 GB | 84 | 8358 | 113.3 s | 61% |
| gemma-4-e2b | gguf 4bit | 128k | 2026-03-02 | 4.1 GB | 135 | 5542 | 47.7 s | 61% | |
| qwen2.5-coder-32b | qwen | mlx 4bit | 32k | 2024-11-08 | 17.2 GB | 23 | 2749 | 146.8 s | 60% |
| qwen3-vl-30b | qwen | mlx 4bit | 256k | 2025-10-04 | 17.0 GB | 79 | 4996 | 90.7 s | 60% |
| qwen3.5-2b | lmstudio-community | gguf 4bit | 256k | 2026-03-02 | 1.8 GB | 158 | 10078 | 66.9 s | 59% |
| gpt-oss-20b | openai | mlx 4bit | 128k | 2025-08-04 | 11.3 GB | 109 | 2537 | 44.1 s | 52% |
| phi-4-reasoning-plus | microsoft | mlx 4bit | 32k | 2025-04-17 | 7.7 GB | 41 | 14415 | 368.4 s | 52% |
| gemma-3-12b | mlx 4bit | 128k | 2025-03-01 | 7.5 GB | 56 | 3824 | 85.1 s | 51% | |
| gemma-3-4b | mlx 4bit | 128k | 2025-02-20 | 2.8 GB | 141 | 2673 | 29.9 s | 50% | |
| gemma-4-e4b | gguf 8bit | 128k | 2026-03-02 | 8.4 GB | 66 | 6229 | 108.0 s | 46% | |
| gemma-4-26b-a4b | gguf 4bit | 256k | 2026-03-12 | 16.8 GB | 89 | 7067 | 111.9 s | 46% | |
| qwen3.5-9b | qwen | gguf 4bit | 256k | 2026-02-27 | 6.1 GB | 59 | 6976 | 129.1 s | 45% |
| gemma-3-27b | mlx 4bit | 128k | 2025-03-01 | 15.7 GB | 27 | 4293 | 187.5 s | 42% | |
| qwen3.5-4b | lmstudio-community | gguf 4bit | 256k | 2026-03-02 | 3.2 GB | 85 | 7046 | 88.5 s | 40% |
| nemotron-3-nano-omni | nvidia | gguf 8bit | 256k | 2026-04-20 | 32.8 GB | 78 | 6254 | 140.1 s | 39% |
| qwen3-8b | qwen | mlx 4bit | 40k | 2025-04-27 | 4.3 GB | 73 | 12899 | 185.7 s | 39% |
| qwen3-4b-2507 | qwen | mlx 4bit | 256k | 2025-08-06 | 2.1 GB | 135 | 3463 | 31.7 s | 39% |
| nemotron-3-nano-omni | nvidia | gguf 4bit | 256k | 2026-04-20 | 24.3 GB | 83 | 14467 | 216.9 s | 38% |
| qwen2.5-coder-14b | qwen | mlx 4bit | 32k | 2024-11-08 | 7.8 GB | 49 | 2421 | 64.0 s | 36% |
| nemotron-3-nano | nvidia | mlx 4bit | 256k | 2025-12-15 | 16.6 GB | 131 | 5504 | 69.1 s | 35% |
| qwen3-4b-thinking-2507 | qwen | mlx 4bit | 256k | 2025-08-05 | 2.1 GB | 113 | 12079 | 113.0 s | 33% |
| granite-4-h-tiny | ibm | gguf 4bit | 1024k | 2025-10-02 | 3.9 GB | 116 | 2198 | 26.2 s | 32% |
| nemotron-3-nano-4b | nvidia | gguf 4bit | 1024k | 2026-03-07 | 2.6 GB | 84 | 4899 | 64.0 s | 30% |
| lfm2-24b-a2b | liquid | mlx 4bit | 125k | 2026-02-24 | 12.5 GB | 135 | 4325 | 53.4 s | 29% |
| ministral-3-14b-reasoning | mistralai | gguf 4bit | 256k | 2025-10-31 | 8.5 GB | 46 | 5531 | 136.0 s | 22% |
| gemma-3n-e4b | mlx 4bit | 32k | 2025-06-03 | 5.5 GB | 79 | 2971 | 49.6 s | 11% | |
| lfm2.5-1.2b | liquid | mlx 8bit | 125k | 2026-01-06 | 1.2 GB | 271 | 4524 | 20.8 s | 7% |
| gemma-4-31b | gguf 8bit | 256k | 2026-03-12 | — | 0 | — | 900.0 s | error | |
| ouro-2.6b | mlx-community | mlx 4bit | 64k | 2025-11-09 | — | 0 | — | 900.0 s | error |
| qwen3-vl-8b | qwen | mlx 4bit | 256k | 2025-10-11 | — | 0 | — | 900.0 s | error |
| qwen3.6-27b | qwen | gguf 8bit | 256k | 2026-04-21 | — | 0 | — | 900.0 s | error |
| glm-4.6v-flash | zai-org | mlx 4bit | 128k | 2025-12-07 | — | 0 | — | 900.0 s | error |
| glm-4.7-flash | zai-org | mlx 8bit | 198k | 2026-01-19 | — | 0 | — | 900.0 s | error |
Click a row to open the model detail page. Hover shows available render previews. Column headers are sortable.