Optimisation Target

Stuart is a last-mile delivery company. When riders are active in the platform their phones send GPS locations so the application knows where everyone is.

Servers are hammered with GPS locations, so this is a part of the system were optimisations may have a significant performance impact. On reception, each GPS location gets inserted into MySQL, and then a number of things follow. (This is a simplification, but faithful enough for the purposes of this post.)

The business is growing fast, and MySQL started to struggle with the load during peak hours. This continued up to the point where manual intervention from the on-duty team was needed to reduce the pressure on the platform.

While some long-term solutions were being worked on, we needed a guerrilla tactic to win time and regain tranquility until those solutions landed.

We wanted to optimise MySQL INSERTs for the same entity performed by separate threads.

These threads could be the ones responsible for HTTP endpoints in a web server. Or they could be BEAM processes in a Phoenix application. In our case, threads execute as Sidekiq workers, but conceptually they all represent the same scenario.

Bulk Data Loading

Before we explain the technique, let’s take one step back.

Let’s imagine you need to import 1,000,000 records into a database in a batch process. Which are the options?

Sequential INSERTs 🐌

The simplest approach is to write a loop that inserts them one after the other.

In Ruby:

loading...

It works, and it may be enough for a small number of records, but it is really inefficient for a large set.

Parallel INSERTs 🚀️️️

The next step is to spawn N threads, with their own database connection each, and split the work among them.

In Ruby:

loading...

As a rule of thumb, this may speed things up to nearly Nx.

It’s a helpful technique for batch processes. In the problem at hand, however, applications already leverage this as a side-effect of being multi-threaded.

Bulk INSERTs 🚀️🚀️

But we can still do better. One single INSERT statement is able to insert multiple rows at once. In this code snippet we insert three rows:

INSERT INTO table_name

 (col1, col2, col3)

VALUES

 (val11, val12, val13),

 (val21, val22, val23),

 (val31, val32, val33);

This is way faster than doing three individual INSERTs.

MySQL documentation has a page dedicated to this optimisation where you can see the server works less.

Another important difference is the amount of network round-trips: Every statement you execute has a response of some sort from the database. In a procedural API, you issue the INSERT, wait for the server to commit the transaction and reply, process the reply, next record, etc.

Therefore, if you issue 1,000 sequential INSERTs, the client needs to wait for MySQL and process its response 1,000 times. That waiting is interleaved with all the operations. On the other hand, if you execute one single INSERT with 1,000 rows, the client waits for MySQL once.

All these observations compound, and therefore the bigger the batch size, the bigger the speed up. To show you the magnitude of the acceleration, these are some factors that I have seen:

Groups of    5 →   5 times faster

Groups of  100 →  57 times faster

Groups of  500 → 189 times faster

Groups of 1000 → 284 times faster

Factors depend on latency, complexity of the statement, etc. But in any case, for a back-of-the-napkin calculation, we can see an opportunity lies in here for huge performance improvements.

CSV Imports 🚀️🚀️🚀️

Can we still squeeze more rows per time unit? Yes!

Importing data in CSV format is even more performant. However, we’ll leave it at bulk INSERTs for this post.

Connecting the Dots

The idea of the technique is to apply this to a dynamic situation where you are receiving a data stream in a multi-threaded endpoint.

There are several ways to do this, and the trade-offs depend on the details of each particular situation. We implemented a process-level buffer and a timer thread. GPS locations received are pushed to the buffer. Every second, that buffer is flushed. Done.

Simple, but the impact is dramatic.

Ruby Implementation

The problem statement in this post, and the technique we use to address it are quite generic. However, let’s see some concrete source code in Ruby.

The Sidekiq worker that receives locations only pushes to a buffer. Can’t be more cheap:

loading...

Schematically, this is the batch writer:

loading...

I like the clarity of the timer thread, which allows you to unit test flush properly in a deterministic manner. The timer thread is not needed in the test suite beyond a trivial coverage that guarantees it flushes as expected.

You could also flush on buffer size. But a time threshold is normally necessary because otherwise flushing could be too infrequent in valleys. In an Elixir application where I have written something similar for DynamoDB I have both logics in place: buffer size, and at most t time between flushes.

You have to choose a value for MAX_BATCH_SIZE that makes sense for the application. Let’s suppose your workload today is less than 100 records per second per process. If you set MAX_BATCH_SIZE to 100, when the platform is quiet maybe you send batches of 15, and when the platform is busy maybe you send batches of 75.

Here’s the point: As we have seen above, batches of 75 are way more efficient than batches of 15. Thus, the busier the system, the faster it runs.

Batch insertion can be done in different ways. My input is a trusted hash table and I build a raw SQL string that is executed directly with exec_insert. If you want to work with ARs, Active Record 6.0 shipped with insert_all and friends. The gem activerecord-import is also worth mentioning.

Finally, we need to ensure the buffer is flushed when the server shuts down:

loading...

Caveats

Due to buffering, there’s a small delay between reception and availability of the event in the platform.

On the other hand, MySQL does not give you the IDs of the new records. If the primary key is auto-incremented, however, MySQL guarantees they are consecutive. Thus, if you need them, you can retrieve the last ID inserted and do the math.

Summary

We’ve seen a way to boost parallel INSERTs with a per-process buffer flushed regularly using one single INSERT with multiple rows.

It’s been a smooth sailing since this was deployed. 🎉

Like what you see? We’re hiring! 🚀

0 whoop whoops