NOTE. The phrase "unit tests" is extremely ambiguous, which is why I use other more explicit terms. A Unit Testing framework can run many different types of test - functional, behavioural, integration, end-to-end, performance, etc. So, I prefer to be clear on what I am referring to.
So how do these layers
breakdown in practical terms of the tests that need writing?
- Unit tests - developer micro tests, using TDD (see Part 2).
- Integration tests (in-house) - developer integration tests for in-house components, using TDD (see Part 2).
- Integration tests (outside world) - partial system tests, ensuring components function correctly with the outside world (see Part 2).
- End-to-end tests - full system test, ensuring all system components function together correctly.
Fewer and fewer tests as you move up
The number of tests written in each layer should reduce as you move up the pyramid. This is safe to do because as Part 2 explained the lower level tests give you confidence, and guarantees as to what is covered - so there is no need to duplicate those tests again. Moving up the pyramid gives you the chance to test things that developer micro-tests cannot cover because you have more system elements together in one place.
By the time your system is in any kind of pre-production environment your tests should just be sanity checks (since all behaviour should already have been proven beyond doubt). These tests are used as much to prove the environment specific config than anything else.
It's like the stages of a jigsaw puzzle
If the idea of not testing every element at every level makes you feel a little uncomfortable, its useful to think about when you would take this approach naturally, away from the world of software. So, try thinking of it like doing a jigsaw.
- Developer-micro tests - making sure that each piece correctly fits with the piece directly next to it.
- Integration tests - making sure the picture is facing the right way up, and the picture looks right.
- End to end tests - making sure that the picture is orientated the right way round.
If every previous step has been done correctly then the next phase has no need to repeat it, the next step is just a subset of checks to ensure you end up with the desired result.
Practical Steps
So how do we go about implementing this in practice? How do we ensure that each layer is not duplicating effort that has already been covered in the layers below?
User Journeys / Stories / Backlog
Whatever mechanism you use to define the work you are going to do, this is the right time to think about how your testing for this feature / change / fix will look.
Your development team should have a view of the work that is coming to them in advance of it arriving, where they can understand the requirement, raise technical issues, ask questions and so on. With the team together this is also a great time for them to discuss testing.
The team should be able to discuss what will be covered by the developer micro tests. That in turn informs what would be sensible to check at an integration level (for both in-house and outside world components), and finally at the end-to-end level - as well as anything else, like performance / stress testing, security etc. The story can then state (roughly, not line for line) what testing will be done where, which could form part of the acceptance criteria. By having these discussions upfront, it helps the teams understanding of what is to be done, and where.
How big is the unit of a micro test?
This is a common question, and was partially covered in
Part 2
- but not fully. For developers there are some practical steps that can be used to help determine this - the great news is that your code will help you!
NOTE. These definitions come from Sando's video in the previous post, but you know that already because you watched the video in full, right? As a reminder this section runs from 23:30-28:08.
Association types
Say you have parent class A, and class A calls child classes B and C. Is your "unit of test" A+B, or A+C, or A+B+C?
To understand this, we need to look at the child classes themselves. In this example class A has 2 associations, one to class B and one to class C. These associations could be one of two types:
The type of association tells you if it is in the scope of the unit to test. Is class B "part of" class A, or just "used by" it?
- If you merge the child class into class A would it still be cohesive (although perhaps messy)? If so then this is a Composition association, and there is no need to mock the child class, you can test is as it is.
- If you merge the child class into class A would it no longer be cohesive (meaning it would violate the single-responsibility principle)? If so then this is an Aggregation association, and the child class should be mocked for testing.
- Another way to think of this is, could the child class evolve independently from class A? If so then this again would mean it is an Aggregation association.
In short, Composition associations are part of your unit, but Aggregation associations are not - so you mock them.
A note about Mocks
Since they were mentioned in Part 2 and I've just mentioned them here, let's quickly discuss mocks.
Mocks are a valuable tool for the developer to aid their design, development, and testing. But they are commonly misunderstood or confused - the word "mocks" is often used as a general term to refer to the whole family of objects that are used in tests.
- Dummy - a test class you call when you know the parameters will never be used.
- Stub - is a Test Double and a kind of Dummy. Use this to force a condition to be true (as an example) without having to execute all the required "real" code to make that happen.
- Spy - is a Test Double and a kind of Stub. Use this when you want to confirm that a certain part of your system has been called. But be careful here to avoid coupling.
- Mock - a "true Mock" is a Test Double and a kind of Spy. But here the assertion is moved into the mock class itself.
- Fakes. These simulate a real business behaviour, in a forced way - therefore they are fundamentally different to all the types described above. These can get extremely complicated and so they are infrequently used.
Stubs and Spies are the ones you will probably use most often. Mocks are most often used when you use mocking tools, because of how those utilities work.
As I've mentioned previously this is about knowing all the different tools at your disposal and using the right one at the right time.
Environment / configuration tests
With all this use of TDD, developer micro-tests, two levels of integration tests, and end-to-end tests you might think that everything was done. In the modern world of automated build pipelines and the cloud however, not quite.
Modern devops practices are great, and you absolutely do need them for your software to have real agility. But that flexibility comes with a lot of extra configuration, which (right now) is just a whole bunch of plain text files (YAML, JSON, etc). That means it's very easy to make a mistake.
That is why you definitely want some sanity check tests for each environment you deploy to in order to ensure that your config - rather than the functionality of your system - is working correctly. Those tests could be:
- Automated end-to-end tests
- A manual check
- System health checks
Which one is right for you is beyond the scope of this post, but you do want to make sure that everything is working when you deploy to a new environment. And yes, that
absolutely includes production!
Non-Functional Requirements (NFRs)
The user journeys / stories mentioned above should also cover any NFRs that your system may have - number of users to support, peak number of transactions to support per second, security requirements, etc. These tests are also part of the pyramid but where exactly they fit depends on your specific circumstances.
I mention them here for completeness and so that you don't forget them (!).
Summary
Done incorrectly the test pyramid is like dragging a weight behind you. Your main source of feedback comes towards the end of your pipeline, often hours (or even days) after the changes have been merged. Higher level tests are more fragile (due to environment config, the complexity of data setup and so on) which means they will need a lot of attention and re-running, possibly taking another couple of hours (or days) again. This leads you away from software agility due to the delays in confirming that your tests pass - if they ever all do.
When implemented properly the test pyramid is a great thing. It helps you focus on starting your testing with the cheapest and quickest tests to write, adding in the increasingly slow and expensive technologies as you move up the layers. This will give you fast feedback where you need it most (in front of the developer) while helping you quickly prove that the whole system hangs together correctly in your higher test environments.
If you want to be able to release your software regularly then you need the automated testing done correctly to support that.
I hope this series of posts have shown how there is one clear route to achieving that goal.