Hypothesis: LLM agents are the new high-level programming language
(originally published here)
Following this hypothesis, what C did to assembler, what Java did to C, what Javascript/Python/Perl did to Java, now LLM agents are doing to all programming languages.
What do I mean by LLM agents? I mean that the main development stack of a human will soon be:
- Multiple: a number of agents working in parallel.
- Autonomous: those agents only requiring feedback from the human every once in a while, but mostly work autonomously.
How can we determine if the hypothesis is true? If a human developer can now build an order of magnitude more (10x) using multiple autonomous agents compared to what the human would be able to build without them, then the hypothesis is true. I’m not sure of it yet (as of January 2026) but I’m seriously considering.
For many that have been in the software business for a while, the mind reels with objections. Let’s address the easy ones first:
- 10x lines of code is not building 10x more, it’s just slop: the measure should be done on the actual functional output delivered, not the lines of code. If we go with the hypothesis, the “lines of code” are really the instructions to the LLM.
- LLMs are only for those who don’t know how to code: while there will be many new programmers thanks to LLMs, that doesn’t mean that experienced programmers won’t benefit from using LLM agents. Evidence shows that many experienced programmers are experiencing a lot more output thanks to LLMs.
- LLMs are for those who don’t want to think/work: if you are using LLMs to do more than you did before, you’ll have to think and work more, not less. It’s more demanding to manage a fleet of agents, and you’ll have to design way more (since you’re building x times what you were building before in the same amount of time).
- LLMs are going to make our coding skills rot: probably. But at work we are not usually concerned about our assembler, or our C chops rotting, if they exist at all. Most of us practice those chops in our free time, because we cannot defend the case that we’d be more productive working in assembler or C at work (for most types of software development).
- The code that LLMs make is much worse than what I can write: almost certainly; but the same could be said about your assembler, or your C code. As long as what the LLM generates is sufficiently efficient, it will run and it will already be ready. The system will be uglier, but it will still work.
- Using LLM agents is expensive: if they give you already 50% more productivity, and your salary is an average salary, they are not. And LLMs will only get cheaper. They are only expensive in absolute, not in relative terms.
- I tried using LLM agents one afternoon and they wasted my time: there’s a learning curve involved. It takes a while to get a hang of working with multiple LLM agents. Think of the hours and days you spent fighting the tools and syntax of your programming stack, until you more or less got it.
(None of the above are defensible, I think, though emotionally they are not easy to accept)
Now for two objections that go to the crux of the matter:
- Quality: aren’t LLMs are generating code that will soon become a dumpster fire? Are we not building in foundations of sand?
- Understandability: won’t LLMs generate so much code we can never hope to understand it? Even if the systems work, are we not forever in peril of not controlling them because we don’t understand them?
I would like tho use quality and understandability as the goals for any acceptable framework of LLM programming. Economically, only quality is indefensible as a goal. Understandability might be a romantic dream or a good long term bet (I’m choosing the latter, but you can of course be agnostic).
Now for the quaint: LLMs are far more nondeterministic than previous higher level languages. They also can help you figure out things at the high level (descriptions) in a way that no previous layer could help you dealing with itself.
How would this look?
Let’s try to find the common elements of how this near-future would look like:
- Documentation: a set of markdown pages that contain the specification of the system: purpose, main entities, endpoints, constraints, core flows, coding standards.
- Implementation: the codebase, plus all of the data. This is what runs and what holds state. The codebase should be reconstructable from the documentation, and the data should be consistent with its description in the documentation.
- Dialogs: multiple agents are churning away at their tasks. They produce text while they’re thinking through the solution, some of it code: this is the dialog (which is expressible as a markdown page). A human can inspect at any time this stream of text, code changes and commands; a human can also enter the dialog. Some dialogs can be waiting for human input. When an agent completes its work, the dialog is no longer alive but it still is accessible.
- Tasks: a dynamic set of discrete pieces of work, expressed as a markdown page. They should be reconstructable from the documentation + the existing state of the codebase. Tasks should be nestable. They have a status (done, pending, in progress, waiting for human interaction, complete).
Looking at this, we see two stocks and two flows. The two stocks are the “tions” (documentation and implementation), which are the accretions of the system. And we also see two flows, which are the dialogs and tasks. The dialogs and the tasks build both the documentation and the implementation. It’s also possible for the human to modify the documentation and the implementation directly, but that won’t happen that often, as most of the flow is agentic and the human will spend most of their time interacting with the agents.
How will agents will be structured? Since agents can play multiple roles (since the underlying models are general purpose), I think we can leave as much freedom as possible here. If any agent can enter any dialog, and any human can enter any dialog, we can let the human experiment with different possibilities:
- Agents working on tasks independently, start to finish.
- Managing agents that are in charge of orchestrating what’s next.
- QA agents to try to break new features.
- Reviewing agents that take a new unmerged feature and review it without the context of the builder.
- Merging agents that resolve conflicts.
The important thing is that the human can either manually or automatically spin agents with instructions that can be either one-offs or a chunk of the documentation.
There’s an opportunity for a new type of world wide web – or rather, for making the existing web much more free and web-like, breaking the silos of applications. That opportunity is MCP. MCP (a standard for tool calling by LLMs), which everyone and their mother is rushing to support, can be considered as a general XMLHTTPRequest. This opens the possibility to have your AI agents take any functionality and data that’s siloed in an existing application and put it in a dynamic canvas of your own choosing.
My original vision for cell was a grid of code and data (the dataspace) that you can fully understand and is already deployed. This is not enough. This will be just the “grid”. Surrounding the grid will be a set of dynamic pages, where documentation and functionality come together.
Documentation won’t just be documentation: you will be able to embed functionality, either from your own application (which will be supported in the grid) or from external applications. You can have mini dashboards or widgets that you can bring to fullscreen. Or you can navigate to another page. Your cell will be a collection of pages, plus the grid, plus the agents that are working on it. And a lot of it can be accessible from the outside.
This all still requires a server for these reasons:
- Receive requests while you’re not online.
- Persist data.
- Keep the agents working.
- Many calls cannot be done directly from the browser for security reasons, so they require a server to make the request.
What about quality and understandability? If instead of a big stack, we use a good substrate, the line count of the LLM output will be much less, and more understandable. If this is the case, we can vastly increase the quality and performance of the systems we build.
The frontend of the system is now the documentation and the agents; the backend is the stack/substrate.
Open questions:
- How do we store the documentation and dialog alongside the implementation?
- How do we use version control systems?