The freedom and limitations of local language models

01 June 2026 1 min read

Agentic Coding

I spent part of the extended weekend doing a few A/B comparisons between local LLMs and a frontier model for reference.

The setup was intentionally simple. I gave two models the same Kubernetes or TypeScript coding task and judged the output subjectively.

The models I compared were:

Gemma 4 26B
Gemma 4 12B
Qwen 3.6 27B
Claude Sonnet 4.6

A few observations

The larger models were noticeably better, which is not surprising.
For coding and infrastructure engineering tasks, Qwen 3.6 27B looked stronger than Gemma 4 26B in my examples.
Gemma did quite well on language-heavy engineering tasks, for example drafting ADRs or post mortems.
Claude Sonnet was still clearly in first place for me overall.

Another aspect that stood out to me is that using a local LLM is very liberating. No compliance to worry about, no subscription or API usage costs, just the hum of excited memory chips processing large amounts of data.

Important caveat: my sample size was small and the methodology was weak. Treat this as a practical anecdote from trying these models for 8+ hours on a few tasks I care about.