Categories
Mozilla

Determining a Ts Regression

For those who have been following the tree status of mozilla-central as of late, you probably noticed that I tried to land SQLite once again, but it was backed out due to a nasty Ts regression on Linux. When I had run this through the try server, it had shown no regression so I had thought it was safe (just like the past three or four other times I’ve tried to land this). Luckily, Johnathan, who was the sheriff when I landed, found a linux box that we could use that reproduced this problem. With a lot of his help, we got standalone talos running just Ts, to get strace logs during startup.

Once I had those logs, I needed some way to parse the files for data so I can use it in a reasonable way. I wrote a python script to parse the strace logs, and then insert them into a sqlite database file (26.8 MB) so I could run some interesting queries on the data.

With that data, I decided to generate some graphs to easily see what was going on. All of these graphs compose the data from the six runs of Firefox that talos ran – the data is all summed up. All the graphs have larger versions available if you click on them.

I figured that the most useful graph for investigating this Ts regression would be execution time:
Total execution time

Note that that is six runs of Firefox, which is why it is as long as it is. Next, I looked at the average execution time for each function call:
Average execution time

And finally, I looked at the number of calls of each of these functions:
Number of calls

We are clearly seeing an increase in the number of fsync calls, and we know that on Linux those can be more painful than they are on other operating systems. My next step is to see if we also see this increase on OS X. If we do, I’m going to assume we see it on windows as well, and get backtraces of every single fsync call to determine why we’ve double the number of calls by upgrading.

I’ll make a new post as more data comes in.