~fpereiro

In this article, I present surface testing, a style of software testing. This style has worked extremely well for me and it might be useful to you as well.

Any piece of software has certain parts that are exposed to its users. These exposed parts of the software are sometimes called “interfaces”, but since the term “interface” is usually conflated with graphical user interfaces or with OOP-style interfaces, I’d rather use the term “surfaces”. The goal of surface testing is to thoroughly test a piece of software only through the surfaces that it exposes.

This is best understood through an example. Let’s say you’re implementing an HTTP API and you want to write tests for it. If you decide to use surface testing, you will test your API entirely through HTTP calls. For example, if you implement two endpoints, one to GET a widget and one to POST a widget, your surface testing will first make a call to create a widget through POST, then one to get the newly created widget through GET.

Surface testing stands in stark contrast to other types of testing.

Unit testing: this style of testing finds the smaller subcomponents possible, whether they’re exposed or not, and tests them in isolation. Often this requires mocking the dependencies of each component. Surface testing does not care about the size of components being tested, nor it cares about testing them in isolation. And surface testing seldom requires mocking of dependencies.
E2E (end-to-end) testing: this style of testing aims to test the entire system, including the graphical user interface, as a whole. Even if the app exposes an API, E2E tests will be executed against the user interface. Surface testing, in contrast, dictates that both the graphical user interface and the API should be tested separately, since both are exposed surfaces.
Integration testing: this style of testing aims to only test the interactions between different systems, making sure they adhere to their respective API contracts. Surface testing doesn’t directly test the interactions; rather, it tests them indirectly through the exposed surfaces of the respective systems. If a surface X in a system A depends on a call to a system B, by testing X and obtaining satisfactory results, the interaction between systems A and B will have been tested too.

Surface testing suggests you take the venerable testing pyramid and chuck it in the bin. The testing pyramid suggests that most of your tests should be unit tests, a few should be integration tests, and precious few should be E2E tests. What this advice will yield is the following:

An enormous amount of unit tests, most of which require sophisticated mocking (which is usually brittle and thus requires constant maintainance). Unit testing relies on coverage, which is the percentage of lines of code that are executed by your test suite; this is a poor substitute of a test suite that comprehensively tests not just all of your code, but does so with clear assertions. Perhaps more importantly, unit tests do not ensure that your system works as a whole.
Some integration tests that, while testing the real interaction between systems, do not test how the systems work as a whole.
A few E2E tests which do test the system as a whole. These are the most valuable exactly because they test the system as a whole. However, the testing pyramid minimizes their use because they’re considered to be slow. This is only partly true. A better case against E2E tests is that they’re great at finding real issues, but not so good at telling exactly what the problem is. This is because they only test the topmost surface, rather than testing surfaces at each level (frontend, backend).

My contention is, in short, that if you follow the testing pyramid, you’ll end up with a test suite that is:

Large, therefore expensive to write.
Brittle and requiring extensive maintenance.
Measured by the outright misleading notion of test coverage.
Barely tests your system as a whole.

Surface testing is much more straightforward:

List: make a list of the parts of the system that you expose, such as user interfaces, API endpoints, or main functions (if you’re writing a library). These are your surfaces.
Run: have a version of your system ready that is running and connected to all it needs to function (DBs, external services, etc.). It doesn’t matter whether it’s running locally or remotely; all that matters is that it should be the real thing. If you’re testing a library, you only need to be able to run it, just as you would if you were using it.
Test: test each of the surfaces of your system in the same way as a highly caffeinated human tester would. For each surface, write tests that send invalid data and make sure that the surface returns proper errors instead of proceeding or crashing. Then, move on to the correct cases, making sure that the surface adequately processes the requests.
Assert: place strict assertions on the results you obtain from each test. It’s not enough (for example) to check whether a read operation gave you a 200 code. You need to check that the actual returned body is exactly what you expect, to the maximum level of detail possible.
Chain: chain the tests in a logical sequence. If you’re building a CRUD, you can start by testing creation, then reading, then updates, then deletions. Usually, to test whether an update or a deletion, you’ll perform another read operation to check indeed that the update or the deletion have been successful. Not only this is OK, it’s the correct way to do it.
On first error, stop. Do not continue running tests if a single case failed. Focus on eliminating that error (by either fixing the code or fixing the test) so that it doesn’t happen again. There are no partial successes. The test suite either fully passes or fully fails. This is auto-activation in action.

When it comes to assertions in your tests, you could think of them as validations of your system. In the same way that your system should validate the inputs provided to it by users, so your tests must validate the outputs produced by your system. Validations (in the system) and assertions (in the tests) are the two sides of the same coin.

What are the downsides of surface testing?

You need to think through exactly what you’re expecting from your system.
You need to test your system without backdoors, which might entail more steps than just checking against a DB or a mock.
You need to think of tests not as isolated things, but as a logical coherent sequence, which greatly raises the bar of understanding you need to write tests.

All three downsides are, to me, virtues: they make testing harder, in the same way than other hard but valuable things, like working out or learning a new language. This effort will not be in vain: it will improve your system and your understanding of it. This is in again in stark contrast with the effort expended in writing unit tests, with their accompanying mocks: those efforts usually improve an isolated part of the system and also do little to improve your understanding of the system.

Surface testing have another advantage over other types of testing: if you want to rewrite your system but maintain your existing contracts, you can simply take the tests from the old system and apply them to the new. Surface tests are as portable as your contracts.

Three important points that I haven’t covered yet:

Non-surface testing: sometimes, you’ll need to make assertions directly against a part of the sytem that is not a surface. This won’t happen often and needs to be justified. For example, if you’re permanently deleting a user, it is a good idea not only to check that the user cannot log in again (through a surface), but actually let the test suite connect to the DB and make sure that there’s no longer an entry for the user. Make sure that, if your test suite requires to access directly a DB or an external service, you have good reason to do this.
Mocking: in some situations, a surface might take over 10 seconds to process a request. This could be the case for image or video processing. In these situations, it makes sense to have a way to either bypass or have mocked results for the subsystem that takes a long time to perform that expensive operation. The test should be able still to test against the real thing; only that it doesn’t have to do it every time you run the test suite. Again, this is the exception, not the rule.
Repeated timeout pattern for assertions on background operations: when you have operations that run in the background and for which the test suite needs to wait, do not use fixed timeouts. Usually, slow operations don’t take a narrowly predictable amount of time; if you use a fixed timeout in your test, your test will sometimes wait too long, and some other times will fail because it didn’t wait enough. Nobody likes slow or fickle tests. Rather, do the following: write a retry function that retries a call every n milliseconds for a maximum of m seconds. For example, if you’re testing that a certain upload happens in the background by then calling an endpoint, use this retry function to call the endpoint every 100 milliseconds, for a maxium of 30 seconds. This will make your test always wait just a bit more than the required amount f time, without adding a backdoor that looks how far advanced the upload is. Even if you’re a beginner programmer, I recommend that you write your own retry function; you’ll learn a lot from this exercise.

One last point that applies to any type of testing: if you find a bug in your system, you didn’t find one bug: you found two. The first one is the bug itself; the second one is the absence of a test that would have caught that bug. When you fix the bug, make sure you also add a test that checks for the specific condition that triggered the error.

If you made it this far, I’ll gladly receive your suggestions & objections. Thanks for reading!