Zachary W. Huang

Thoughts on LLMs for Coding

September 14, 2025

It has taken a shockingly short time for LLMs to go from fun playthings to tools that are legitimately helpful when coding. Maybe I’m just late to the party, but I didn’t even start to experiment with using LLMs for coding until maybe a year ago, and in the majority of cases, I was not satisfied with the outputs that they would produce. I still remember when Github (Microsoft?) Copilot was released - I was reading more about potential code licensing issues than anyone praising the quality of the code produced.

But now, we have tools like Cursor and Windsurf making AI coding easier than ever, agents like Claude Code that can perform the entire development cycle (code, compile, test, repeat) while you sit back and relax, and models that are continually improve with every generation. I think it’s hard to be bearish on LLMs at this point.

I’ve slowly started to rely more on LLMs, bit by bit. I think they work especially well on “transpile”-like tasks, such as converting LaTeX templates into Typst, or translating code from one language to another. They can also complete simple tasks in any programming language, though I think this is because they are still essentially just regurgitating example programs from their training set. However, the performance of these LLMs is extremely dependent on how much training data they had for a particular language - they will happily spit out decent Python and C++ code, but trying to get LLMs to write good OCaml code or VHDL is borderline impossible in my experience.

I’ve found Gemini Pro 2.5 to be the best coding LLM I’ve used. When giving the same task to Claude, Gemini, ChatGPT, and others, I almost always prefer the Gemini output. I guess it’s hard to explain why, but I feel like Gemini’s outputs tend to be more correct, have better architectural decisions, and/or are generally closer to what I would code myself.

While it pains me to talk about “prompt engineering” unironically, I would have to agree that there are certain steps you can take to improve the quality of an LLM’s outputs. I don’t mean things like “being assertive”, but rather giving LLMs context for their task. So if I want it to, say, implement a function, I’ll give it a specification, example inputs/outputs, other requirements, and how this function will be used in the overall codebase. Or if I need code that adhers to an interface, I just throw in a pdf with the specification and let the LLM pull the context from there. Then, I’ll typically ask the LLM to revise or make changes another few times until the code is something I’m satisfied with, though I’ll still maybe make a few changes on my own before actually using it.

The issue with this is that after taking all the time it takes to iterate on an LLM’s code until it’s something I would use, it really is not that much of a productivity improvement. It’s kinda like the “using 5 hours to automate a task that takes 5 minutes” thing, but rather “spending 10 minutes prompting an LLM when it would have taken me 15 minutes to write it myself”. I think it’s an improvement, but definitely not the 10x gains that investors seem to think it will be.

And the other, more insidious issue with LLMs, in my opinion, is “vibe coding”. I hate to admit it, but I’m kinda lazy. And thinking is hard. In my opinion, it takes a lot of mental fortitude and effort to actually think deeply about a problem, come up with multiple potential solutions, weigh the trade-offs, implement the best solution, and then move on to the next problem. The actual code implementation is by far the easiest part of this cycle. Because of this, it’s extremely easy to to just give some task to an LLM, skim the output, and copy it over. And if it breaks some other part of your code, just copy your other code over, and let the LLM revise that. Rinse and repeat until everything works or everything is so broken that you give up and start over. And if everything happens to work, you no longer understand your own codebase, and suddenly, the only way to make progress is to throw everything back into the LLM and let it code for you. Before you know it, you’ve been arguing with the LLM for 5 hours and have made zero progress on the thing you’re actually trying to do. And unfortunately, yes, this comes from personal experience.

What this means, in my opinion, is that you should never rely on an LLM to generate code that you yourself do not understand. However, this fundamentally limits how useful an LLM can be for coding. If you have to review every piece of code an LLM generates, then that obviously slows down the development process. Additionally, letting an LLM do actual design and make architectural choices for you is a mistake - you’ll fall into the cycle that I described. So instead, you have to do the hard part (the actual thinking), and just let the LLM fill in the code. But this means that you’ve essentially automated away the easy part of your job (coding) and now have to act more like an architect than a code monkey. So I think that LLM’s will actually make software development harder in the long run, not easier. And in the cases where you’re coding something novel, not throwing together existing algorithms and libraries, then an LLM isn’t going to perform all that well anyway.

So personally, I still think that LLM’s are slightly overvalued for coding. Of course, it also depends on the domain, but I’m not convinced that LLMs will be able to replace anyone actually worth their salt in a real software position. They can be useful assistants (extremely useful in some cases), but I still struggle to imagine them as anything more. If I’m wrong, then I’ll probably be unemployed in 5 years and pivot to starting a bubble tea shop instead.

back