Surface testing with LLMs

A couple of years ago, I wrote about my preferred type of software testing: surface testing (https://federicopereiro.com/surface-testing/). A summary of the approach is that 1) you test a system only through its exposed “surfaces” (APIs, UIs, library functions); 2) you run the tests against the real codebase with zero mocks, in a meaningful linear order, and stop at the first error.

Now, LLM coding agents have come along and changed everything. You can produce code almost as fast as you can think of what it should do. At the same time, AI is still great at producing software of questionable quality. The massive increase of quantity and significant decrease of quality just makes testing all the more important.

In two separate projects (FuelFWD and vibey) of very different nature (mid-size SaaS worked on by a team, vs new solo open source project), I’ve recently tackled writing a surface test suite from scratch. These are my learnings:

You can see an example here, in vibey:

Note: the Vibey suite is not modularized into files – at FuelFWD, because the project is larger, we did split the test modules into separate files, which also allows for parallel work within the same branch by multiple parallel agents.

I feel ambivalent about not having written these tests myself. There are still unnecessary and badly named variables everywhere. But that’s how I feel about coding with agents at high speed in general. I still need to explore how I can use agents to make the code elegant.

This approach is now enabling me to use AI to produce software that works decently well. Hope it’s useful to you.