Determining a Ts Regression

For those who have been following the tree status of mozilla-central as of late, you probably noticed that I tried to land SQLite once again, but it was backed out due to a nasty Ts regression on Linux. When I had run this through the try server, it had shown no regression so I had thought it was safe (just like the past three or four other times I’ve tried to land this). Luckily, Johnathan, who was the sheriff when I landed, found a linux box that we could use that reproduced this problem. With a lot of his help, we got standalone talos running just Ts, to get strace logs during startup.

Once I had those logs, I needed some way to parse the files for data so I can use it in a reasonable way. I wrote a python script to parse the strace logs, and then insert them into a sqlite database file (26.8 MB) so I could run some interesting queries on the data.

With that data, I decided to generate some graphs to easily see what was going on. All of these graphs compose the data from the six runs of Firefox that talos ran – the data is all summed up. All the graphs have larger versions available if you click on them.

I figured that the most useful graph for investigating this Ts regression would be execution time:

Note that that is six runs of Firefox, which is why it is as long as it is. Next, I looked at the average execution time for each function call:

And finally, I looked at the number of calls of each of these functions:

We are clearly seeing an increase in the number of fsync calls, and we know that on Linux those can be more painful than they are on other operating systems. My next step is to see if we also see this increase on OS X. If we do, I’m going to assume we see it on windows as well, and get backtraces of every single fsync call to determine why we’ve double the number of calls by upgrading.

I’ll make a new post as more data comes in.

Between version 3.5.4 and 3.5.9, there was a change to the way SQLite behaves in exclusive locking mode which we believe is the cause of the extra fsync()s. First some background….

The default behavior of SQLite is to only hold file locks for the minimum amount of time necessary. So when a process writes to the database, it first obtains a write lock on the file, makes its change, then immediately releases the lock so that other processes can also have a turn writing.

There is an alternative behavior called “exclusive locking mode” which FF uses (if I am not badly mistaken.) In exclusive locking mode, once SQLite obtains a write lock on the database, it holds that lock until the database connection is closed. This means that other processes (or even other database connections within the same process) are unable to modify the database; concurrency takes a big hit. But the way FF uses SQLite, only one process is accessing the database at a time anyhow, so that doesn’t really matter. And by holding the lock indefinitely, the process that owns the lock does not have to bother to reacquire and rerelease the lock for each transaction, nor does it have to check to see if another process has modified the database between transactions. There are speed advantages there. (The speed advantages are not that great on a workstation running Linux, but they can be significant for embedded devices with less powerful filesystems.)

A change occurred in the way transactions commit in exclusive locking mode in between SQLite version 3.5.4 and 3.5.9. In exclusive locking mode in version 3.5.4, a transaction commits by truncating the rollback journal to zero length. In SQLite 3.5.9, we changed commit to zero out the header of the rollback journal and then fsync(). We found that on many systems, increasing the length of a file (such as the rollback journal) is much more expensive than overwriting an existing segment of a file. By zeroing the rollback journal header, the rollback journal is still non-zero length at the start of the next transaction and writes to the rollback journal are therefore faster. This is a big performance win on many systems, especially embedded systems.

To summarize, 3.5.4 does a single ftruncate() call whereas 3.6.2 does a small write() followed by an fsync().

One could argue that the ftruncate() call in 3.5.4 should also have been followed by an fsync(). I suppose whether or not it is safe to do an ftruncate() that is not followed by an fsync() is filesystem dependent. Probably the fsync() is not required on ext3, but it might be required on other older filesystems. I’m not sure. But it seems that a small write() followed by an fsync() as is done in 3.5.9 and later is a safer approach.

On the other hand, the new approach does generate extra fsync() calls.

There is no option in the latest SQLite that allows one to revert from the write()/fsync() commit back to the older ftruncate() commit. However, taking SQLite out of exclusive locking mode will cause it to use unlink() for commits. Perhaps an unlink() is faster than a write()/fsync() on ext3. It seems like it is worth a try.

We will also investigate compile-time or run-time options to SQLite that allow one to choose the ftruncate() commit algorithm instead of the newer write()/fsync() commit.

By Shawn Wilsher

One reply on “Determining a Ts Regression”