Coding

Single-shot code generation (Kanban board as an HTML file). Measures how fast and how functionally models solve a concrete UI task. Hovering over a model row shows a screenshot of the rendered app.

Task & test logic in detail

Task: From a ~200-word prompt the model must generate a fully functional Kanban board as a single-file HTML with drag & drop, localStorage persistence, edit/delete and a confetti animation — in a single chat without iteration. The prompt also includes a small `data-testid` contract so a Playwright test can drive the app remotely. Three signals feed into the score: (1) Static — a linter checks concrete constraints in the HTML (columns, Tailwind, localStorage call, no framework, no window.alert/prompt, …). (2) Functional — Playwright runs a small CRUD sequence: create a card, delete a card with confirmation, reload — does state persist? — and checks whether any JS console errors occur during the entire flow. Drag & drop and confetti are deliberately not tested functionally (too many implementation variants). (3) Qualitative — LLM-as-judge rates screenshot and code (visual + code quality + render↔code consistency). Score = mean over the available signals. Why models fail: reasoning models burn their tokens in thinking instead of writing. Sliding-window models (Gemma 4) lose the constraints at the start of the prompt. Small models (<3B) often fail to produce coherent HTML — or ignore the data-testid contract, which makes the functional tests fail in droves.

Prompt

System prompt

You are a careful front-end engineer.

Developer prompt

Create a fully functional Kanban board in a single HTML file using vanilla JavaScript (no frameworks like react).

Requirements:
- Columns: Backlog, In Progress, Review, Done.
- Cards must be:
- draggable across columns,
- editable in place,
- persisted in localStorage (state survives reloads) - please use your own namespace,
- deletable with a confirmation prompt.
- Each column provides an "Add card" action.
- Style with Tailwind via CDN.
- Add subtle CSS transitions and trigger a confetti animation when a card moves to "Done".
- Thoroughly comment the code.
- dont use window.alert or window.prompt to add/edit/delete cards
- if there are no cards yet, create some dummy cards
- modern and vibrant design

Stable test selectors (mandatory — these data-testid attributes are used by an automated functional test; do not omit, rename, or split them across multiple elements):
- Column containers: data-testid="column-backlog", data-testid="column-in-progress", data-testid="column-review", data-testid="column-done".
- Every "Add card" button (one per column): data-testid="add-card".
- Every card element: data-testid="card".
- Inside each card, the delete trigger: data-testid="delete-card".
- The confirm button of the delete-confirmation dialog/modal: data-testid="confirm-delete".
- The input/textarea where a new card title is typed: data-testid="card-input". Pressing Enter in this input MUST commit the new card.

As answer return the plain HTML of the working application (script and styles included)

Wall-time vs. quality

Max RAM

X = wall-time for this bench · Y = score (0–100 %) in this bench. Optimum is top-left — fast and good. RAM estimate for 64k context: 4 GB system + model weights + max(2 GB, 40% of weights) for KV cache.

Colour = vendor · Number = total parameters (B) dense MoE

Models in this bench

49 visible

1. qwen3.6-35b-a3b gguf 4bit 93% · 160s · 81 t/s · 33 GB
2. qwen3.6-27b gguf 4bit 89% · 608s · 22 t/s · 27 GB
3. gemma-4-31b gguf 4bit 88% · 353s · 21 t/s · 30 GB
4. qwen3.6-35b-a3b gguf 8bit 87% · 203s · 67 t/s · 53 GB
5. qwen3-coder-next mlx 4bit 87% · 144s · 72 t/s · 67 GB
6. gemma-4-26b-a4b gguf 8bit 86% · 118s · 72 t/s · 41 GB
7. glm-4.5-air-mlx mlx 4bit 86% · 372s · 35 t/s · 82 GB
8. qwen3.5-27b-claude-4.6-opus-distilled-mlx mlx 4bit 85% · 276s · 25 t/s · 24 GB
9. seed-oss-36b mlx 4bit 83% · 670s · 18 t/s · 31 GB
10. devstral-small-2-2512 mlx 4bit 81% · 223s · 33 t/s · 22 GB
11. qwen3.5-122b-a10b gguf 4bit 80% · 733s · 4 t/s · 102 GB
12. qwen3-coder-30b mlx 4bit 78% · 83s · 95 t/s · 26 GB
13. glm-4.7-flash mlx 4bit 78% · 143s · 69 t/s · 28 GB
14. gemma-4-e2b gguf 8bit 73% · 68s · 110 t/s · 12 GB
15. qwen3.5-9b gguf 8bit 70% · 172s · 44 t/s · 18 GB
16. gpt-oss-120b gguf 4bit 69% · 149s · 80 t/s · 87 GB
17. llama-3.3-70b gguf 4bit 66% · 305s · 10 t/s · 59 GB
18. gemma-4-e4b gguf 4bit 65% · 91s · 85 t/s · 12 GB
19. nemotron-3-super gguf 4bit 63% · 570s · 30 t/s · 116 GB
20. qwen3.5-35b-a3b gguf 4bit 62% · 113s · 79 t/s · 33 GB
21. qwen3.5-9b-mlx mlx 4bit 61% · 113s · 84 t/s · 12 GB
22. gemma-4-e2b gguf 4bit 61% · 48s · 135 t/s · 10 GB
23. qwen2.5-coder-32b mlx 4bit 60% · 147s · 23 t/s · 28 GB
24. qwen3-vl-30b mlx 4bit 60% · 91s · 79 t/s · 28 GB
25. qwen3.5-2b gguf 4bit 59% · 67s · 158 t/s · 8 GB
26. gpt-oss-20b mlx 4bit 52% · 44s · 109 t/s · 20 GB
27. phi-4-reasoning-plus mlx 4bit 52% · 368s · 41 t/s · 15 GB
28. gemma-3-12b mlx 4bit 51% · 85s · 56 t/s · 15 GB
29. gemma-3-4b mlx 4bit 50% · 30s · 141 t/s · 9 GB
30. olmo-3-32b-think mlx 4bit 50% · 743s · 22 t/s · 28 GB
31. gemma-4-e4b gguf 8bit 46% · 108s · 66 t/s · 16 GB
32. gemma-4-26b-a4b gguf 4bit 46% · 112s · 89 t/s · 27 GB
33. qwen3.5-9b gguf 4bit 45% · 129s · 59 t/s · 13 GB
34. gemma-3-27b mlx 4bit 42% · 187s · 27 t/s · 26 GB
35. qwen3.5-4b gguf 4bit 40% · 88s · 85 t/s · 9 GB
36. nemotron-3-nano-omni gguf 8bit 39% · 140s · 78 t/s · 50 GB
37. qwen3-8b mlx 4bit 39% · 186s · 73 t/s · 10 GB
38. qwen3-4b-2507 mlx 4bit 39% · 32s · 135 t/s · 8 GB
39. nemotron-3-nano-omni gguf 4bit 38% · 217s · 83 t/s · 38 GB
40. qwen3-30b-a3b-2507 mlx 4bit 38% · 74s · 95 t/s · 26 GB
41. qwen2.5-coder-14b mlx 4bit 36% · 64s · 49 t/s · 15 GB
42. nemotron-3-nano mlx 4bit 35% · 69s · 131 t/s · 27 GB
43. qwen3-4b-thinking-2507 mlx 4bit 33% · 113s · 113 t/s · 8 GB
44. granite-4-h-tiny gguf 4bit 32% · 26s · 116 t/s · 10 GB
45. nemotron-3-nano-4b gguf 4bit 30% · 64s · 84 t/s · 9 GB
46. lfm2-24b-a2b mlx 4bit 29% · 53s · 135 t/s · 22 GB
47. ministral-3-14b-reasoning gguf 4bit 22% · 136s · 46 t/s · 16 GB
48. gemma-3n-e4b mlx 4bit 11% · 50s · 79 t/s · 12 GB
49. lfm2.5-1.2b mlx 8bit 7% · 21s · 271 t/s · 7 GB

Model	Vendor	Quant	Ctx	Released	RAM	tok/s	Tokens	Wall	Score
qwen3.6-35b-a3b	qwen	gguf 4bit	256k	2026-04-15	20.6 GB	81	9964	159.9 s	93%
qwen3.6-27b	qwen	gguf 4bit	256k	2026-04-21	16.3 GB	22	12474	607.9 s	89%
gemma-4-31b	google	gguf 4bit	256k	2026-03-12	18.5 GB	21	6439	352.7 s	88%
qwen3.6-35b-a3b	qwen	gguf 8bit	256k	2026-04-15	35.2 GB	67	9460	203.0 s	87%
qwen3-coder-next	qwen	mlx 4bit	256k	2026-01-30	45.3 GB	72	6000	144.0 s	87%
gemma-4-26b-a4b	google	gguf 8bit	256k	2026-03-12	26.1 GB	72	5134	118.0 s	86%
glm-4.5-air-mlx	lmstudio-community	mlx 4bit	128k	2025-07-28	56.0 GB	35	10321	372.4 s	86%
qwen3.5-27b-claude-4.6-opus-distilled-mlx	mlx-community	mlx 4bit	256k	2026-03-04	14.1 GB	25	6334	275.6 s	85%
seed-oss-36b	bytedance	mlx 4bit	512k	2025-08-20	19.0 GB	18	11498	670.1 s	83%
devstral-small-2-2512	mistralai	mlx 4bit	384k	2025-12-09	13.2 GB	33	6444	222.6 s	81%
qwen3.5-122b-a10b	lmstudio-community	gguf 4bit	256k	2026-02-24	70.0 GB	4	7742	732.9 s	80%
qwen3-coder-30b	qwen	mlx 4bit	256k	2025-07-31	16.0 GB	95	5402	83.2 s	78%
glm-4.7-flash	zai-org	mlx 4bit	198k	2026-01-19	16.9 GB	69	7746	142.8 s	78%
gemma-4-e2b	google	gguf 8bit	128k	2026-03-02	5.5 GB	110	6405	67.7 s	73%
qwen3.5-9b	qwen	gguf 8bit	256k	2026-02-27	9.7 GB	44	6799	172.2 s	70%
gpt-oss-120b	openai	gguf 4bit	128k	2025-08-04	59.0 GB	80	2837	149.2 s	69%
llama-3.3-70b	meta	gguf 4bit	128k	2024-12-06	39.6 GB	10	2369	304.6 s	66%
gemma-4-e4b	google	gguf 4bit	128k	2026-03-02	5.9 GB	85	6864	90.9 s	65%
nemotron-3-super	nvidia	gguf 4bit	1024k	2026-03-10	80.1 GB	30	12548	569.9 s	63%
qwen3.5-35b-a3b	qwen	gguf 4bit	256k	2026-02-24	20.6 GB	79	5988	112.9 s	62%
qwen3.5-9b-mlx	mlx-community	mlx 4bit	256k	2026-02-27	5.6 GB	84	8358	113.3 s	61%
gemma-4-e2b	google	gguf 4bit	128k	2026-03-02	4.1 GB	135	5542	47.7 s	61%
qwen2.5-coder-32b	qwen	mlx 4bit	32k	2024-11-08	17.2 GB	23	2749	146.8 s	60%
qwen3-vl-30b	qwen	mlx 4bit	256k	2025-10-04	17.0 GB	79	4996	90.7 s	60%
qwen3.5-2b	lmstudio-community	gguf 4bit	256k	2026-03-02	1.8 GB	158	10078	66.9 s	59%
gpt-oss-20b	openai	mlx 4bit	128k	2025-08-04	11.3 GB	109	2537	44.1 s	52%
phi-4-reasoning-plus	microsoft	mlx 4bit	32k	2025-04-17	7.7 GB	41	14415	368.4 s	52%
gemma-3-12b	google	mlx 4bit	128k	2025-03-01	7.5 GB	56	3824	85.1 s	51%
gemma-3-4b	google	mlx 4bit	128k	2025-02-20	2.8 GB	141	2673	29.9 s	50%
olmo-3-32b-think	allenai	mlx 4bit	64k	2025-11-19	16.9 GB	22	15524	743.1 s	50%
gemma-4-e4b	google	gguf 8bit	128k	2026-03-02	8.4 GB	66	6229	108.0 s	46%
gemma-4-26b-a4b	google	gguf 4bit	256k	2026-03-12	16.8 GB	89	7067	111.9 s	46%
qwen3.5-9b	qwen	gguf 4bit	256k	2026-02-27	6.1 GB	59	6976	129.1 s	45%
gemma-3-27b	google	mlx 4bit	128k	2025-03-01	15.7 GB	27	4293	187.5 s	42%
qwen3.5-4b	lmstudio-community	gguf 4bit	256k	2026-03-02	3.2 GB	85	7046	88.5 s	40%
nemotron-3-nano-omni	nvidia	gguf 8bit	256k	2026-04-20	32.8 GB	78	6254	140.1 s	39%
qwen3-8b	qwen	mlx 4bit	40k	2025-04-27	4.3 GB	73	12899	185.7 s	39%
qwen3-4b-2507	qwen	mlx 4bit	256k	2025-08-06	2.1 GB	135	3463	31.7 s	39%
nemotron-3-nano-omni	nvidia	gguf 4bit	256k	2026-04-20	24.3 GB	83	14467	216.9 s	38%
qwen3-30b-a3b-2507	qwen	mlx 4bit	256k	2025-07-21	16.0 GB	95	4679	74.1 s	38%
qwen2.5-coder-14b	qwen	mlx 4bit	32k	2024-11-08	7.8 GB	49	2421	64.0 s	36%
nemotron-3-nano	nvidia	mlx 4bit	256k	2025-12-15	16.6 GB	131	5504	69.1 s	35%
qwen3-4b-thinking-2507	qwen	mlx 4bit	256k	2025-08-05	2.1 GB	113	12079	113.0 s	33%
granite-4-h-tiny	ibm	gguf 4bit	1024k	2025-10-02	3.9 GB	116	2198	26.2 s	32%
nemotron-3-nano-4b	nvidia	gguf 4bit	1024k	2026-03-07	2.6 GB	84	4899	64.0 s	30%
lfm2-24b-a2b	liquid	mlx 4bit	125k	2026-02-24	12.5 GB	135	4325	53.4 s	29%
ministral-3-14b-reasoning	mistralai	gguf 4bit	256k	2025-10-31	8.5 GB	46	5531	136.0 s	22%
gemma-3n-e4b	google	mlx 4bit	32k	2025-06-03	5.5 GB	79	2971	49.6 s	11%
lfm2.5-1.2b	liquid	mlx 8bit	125k	2026-01-06	1.2 GB	271	4524	20.8 s	7%
gemma-4-31b	google	gguf 8bit	256k	2026-03-12	—	0	—	900.0 s	error
ouro-2.6b	mlx-community	mlx 4bit	64k	2025-11-09	—	0	—	900.0 s	error
qwen3-vl-8b	qwen	mlx 4bit	256k	2025-10-11	—	0	—	900.0 s	error
qwen3.6-27b	qwen	gguf 8bit	256k	2026-04-21	—	0	—	900.0 s	error
glm-4.6v-flash	zai-org	mlx 4bit	128k	2025-12-07	—	0	—	900.0 s	error
glm-4.7-flash	zai-org	mlx 8bit	198k	2026-01-19	—	0	—	900.0 s	error

Click a row to open the model detail page. Hover shows available render previews. Column headers are sortable.