~fpereiro

Earlier I described surface testing, which is still how I go about testing systems.

Lately, I co-developed a system that performs document extraction partly with the help of LLMs, which brought an interesting twist: we have dozens of test cases where we extract the documents and assert things about them. Running the whole thing, however, takes 20-30 minutes and costs 1-2 bucks, simply because of the LLM calls.

I’ve resisted the temptation to write mocks for these. I’ve also resisted the stronger temptation to not run extractions as part of the test suite. Instead, we just cache the results of previous extractions, so that we can save on time and money when running the tests. This has the following advantages over mocking:

No manual copy-pasting required: the test suite itself creates the caches when running.
In the absence of a cache, you can simply recreate it by running the tests.
We can override some (or all) caches to re-test that part.
All the non-external parts of the code still run every time.

The caches are accessed by a conditional piece of code that only works in local development. The data stored in the file is exactly what we’d get from the external call. I believe this can be used for most surface testing that involves slow, expensive calls.

The trigger for renewing the caches is when the logic in question is being re-tested. When other parts of the system are tested, they can still rely on the extraction suite without paying a performance cost.

Caching is better than mocking