Skip to content

The freedom and limitations of local language models

Agentic Coding

I spent part of the extended weekend doing a few A/B comparisons between local LLMs and a frontier model for reference.

The setup was intentionally simple. I gave two models the same Kubernetes or TypeScript coding task and judged the output subjectively.

The models I compared were:

  • Gemma 4 26B
  • Gemma 4 12B
  • Qwen 3.6 27B
  • Claude Sonnet 4.6

A few observations

  • The larger models were noticeably better, which is not surprising.
  • For coding and infrastructure engineering tasks, Qwen 3.6 27B looked stronger than Gemma 4 26B in my examples.
  • Gemma did quite well on language-heavy engineering tasks, for example drafting ADRs or post mortems.
  • Claude Sonnet was still clearly in first place for me overall.

Another aspect that stood out to me is that using a local LLM is very liberating. No compliance to worry about, no subscription or API usage costs, just the hum of excited memory chips processing large amounts of data.

Important caveat: my sample size was small and the methodology was weak. Treat this as a practical anecdote from trying these models for 8+ hours on a few tasks I care about.