It was blood, sweat, and tears for Stuart’s Solutions Engineers team until we found the right combination of configurations in order to ensure our tests using Jest always pass.
Before jumping to our recommendations, we’ll set the scene on who we are and what we’re trying to do.
👋 Context: The Solutions Team
In January 2020, we created a new team within Stuart Tech: the Solutions Engineers team.
The team’s purpose is to bridge the gap that exists between the business and the products that we have at Stuart. If the business team wants to trial a new business use-case, they ask for support from the new solutions engineers team. We then build solutions to unblock their needs in a short period of time.
To put it in another way: We are a startup inside a startup!
The team mainly works on projects with a short life span, adding the maximum value in the least amount of time possible. To achieve this, we had to make technical decisions on our development stack. That’s why our Solutions stack looks like this:
- Node JS and Express JS (using Typescript) for the backend;
- React JS for the frontend;
- NPM as package manager;
- Jest for tests;
- ESlint as linter;
- Jenkins as CI.
All these tools are set up using a monorepo, in which we share all common code between Solutions in a very easy way.
🤔 The problem
Jest is a resource eater. Doesn’t matter if you have 8GB, 16GB, or 32GB of RAM. Jest will use it all. How about the CPU? Exactly the same. Two cores? Yummy! Eight cores? Yuuuummy! 😋
This is a pretty interesting situation. We are very used to running software that is extremely badly optimised. So badly that most of it usually uses a single CPU and just a few MB or GB of RAM, because the single-threaded application acts as a bottleneck. But that’s not the case of Jest. Jest adapts to your maximum resources and drains them all.
This is why all over the internet, you can very easily find recommendations for forcing Jest to run in a single thread, or a maximum of 50% of your CPUs.
And luckily, you can do it quite simply by setting parameters in Jest such as maxWorkers. This will be enough for development purposes. But… How about when running it in a CI tool such as Jenkins? If you have 4 workers available per Jenkins node, then 4 Jenkins jobs can be executed in parallel.
In the scenario of running 4 jobs in parallel with Jest’s maxWorkers to 1, you’ll end up having 4 CPUs consuming 100% (not exactly, but for the purpose of this example, close enough!).
On the other hand, you cannot control Jest RAM usage. It’ll use all your RAM if needed. So again, if you have 16GB of RAM, your individual Jest executions, running in the same Jenkins node, will try to consume 16GB each!
Because of the high CPU and RAM usage, the Jenkins daemons will not be able to communicate with the controller node—and the whole Jenkins execution will fail.
😱 We need to prevent Jest from killing our nodes!
👩🏫 Our approach
Our Jenkins machines are 15GB machines, with 4 workers per Jenkins node. As a result, we can have up to 4 Jest commands running at the same time in the same machine.
Limiting the CPU
Thanks to Jest, this is extremely easy. Just use:
Why 20%, you might wonder? Simple maths: 4 (Jenkins workers) * 20% of the CPU = 80%. We’ll leave 20% for the system (so that the Jenkins Daemon can still respond, OS stuff, etc.).
Limiting the RAM
Here’s the biggest challenge. Jest doesn’t allow you to set a maximum of RAM. What’s even worse, Jest won’t trigger any Garbage collecting process unless it is running out of RAM or you force it to do so. That means that the RAM usage will increase and increase and increase—up until the point where GC is triggered. If that’s not enough to free some memory (:cough: leaks :cough:), swapping starts to happen. By then, tests are running really, really slow.
In order to limit the RAM, we went for the Docker approach. For every test suite we want to run, we spin up a Docker image and run it while limiting its resources based on the host resources.
Considering that our Jenkins nodes have 4 workers, we need to limit the resources based on the maximum available RAM, which in our case is 15GB.
We need to understand that RAM behaves slightly differently than the CPU. It takes time for RAM to be fully used, so in this case we can be a little more aggressive in allocating the resources. In our case, we decided to set the Docker limit to 4.3GB. 4.3GB * 4 = 17,2GB — slightly above the RAM limit.
How can you limit the RAM in Docker, you might wonder?
docker run -m 4300m tests_image
Both things together
If you really want to make sure Jest doesn’t drain all your CPUs, you can also benefit from Docker’s CPU limitations. Our host has 8 logical CPUs. So to be consistent with the 20% CPU limitation we set on the Jest command, we will set a CPU limitation on every Docker container to consume up to 6.5~ logical CPUs.
Again, as we will be running a maximum of 4 dockers in parallel in a Jenkins node, that’s around 1.5 CPUs.
docker run --cpus='1.5' tests_image
And that, in combination with the RAM setting, becomes:
docker run -m 4300m --cpus='1.5'
🏇 Does this speed up tests?
Our biggest pain point was our Jenkins agents dying because of high CPU and RAM usage. With this approach, we resolved that issue.
It does improve speed for the test suites that are small. For those that eventually consume close to 4GB of RAM, the tests run slowly. This is not a big deal, because thanks to our Jenkins setup built by our DevOps team, the Jenkins nodes are auto-scaled. By using Jenkins pipelines, we can simply parallelise as many tests as we want into different Jenkins jobs.
🚰 Working around the memory leaks
In order to improve tests’ performance and ensure they do not slow down because they are consuming too much RAM, our recommendation is not to run all tests in a single Jest command.
Instead, divide your tests into multiple executions. In our case, we have a monorepo with more than 10 applications. We have divided these 10 suites into 5 different Jest executions. By restarting Jest, you’ll make sure you start from 0, and not with previous leaks affecting your current tests.
🐌 Still too slow?
It could be that soon after starting running the tests, they’ll already be running slowly. It’s then highly likely that you’ve already hit the max RAM allowed. To confirm this, run “docker stats”. It will show the current memory usage and its limit.
If the usage hits 100% very quickly—unless you find a big memory leak and manage to resolve it—the only solution is to increase docker RAM: run the test suite without setting any docker limitations, and while monitoring with docker stats, write down the highest peak.
If the peak is way beyond the RAM you have set to the Docker container, you should increase the RAM limits in the docker container, but also make sure that the maths still make sense. If we had to increase our RAM further for our tests, we’d have two options:
- Increase nodes RAM from 15GB to 32GB.
- Reduce the Jenkins workers from 4 to 3.
We spent a couple of weeks with a trial and error approach until we found the perfect numbers for our infrastructure. You should not expect to copy-paste any configuration. You need to understand your infrastructure and adapt to it.
We also found it pretty interesting to learn that a tool which makes use of all the hardware available is, in reality, a problem. We are so used to running programs in over-dimensioned hosts that when it’s not the case, we struggle to figure out how to fix that.
Would you like to see more? We are hiring! 🚀 Check out our open positions.