Recently, I started working on one of our Q3 goals to reduce “dirty” profile startup time to be only 20% of normal startup time. It is a pretty big and scary problem, and is probably the hardest problem I have ever worked on for the Mozilla project (I have had some doozies in the past too…). So far I have only spent time trying to get consistent results of startup time with a dirty profile (of the Firefox kind) so I can then compare profiles (of the program profile kind) to a clean profile (of the Firefox kind). For now, I am just using a copy of my own profile to get some data (no, you cannot have a copy of it).
Unfortunately, I immediately hit a bit of a speed bump. I think a picture best explains this:
This is a chart of the 20 runs talos does when measuring startup time. As you can see, the dirty profile kept on increasing each and every run. After about a day of investigating this, the cause is finally known: tabs. Turns out, talos tries to quit the browser before calling
window.close(), which results in the tabs not closing. As a result, at the end of the 20 cycles, we were loading 19 more tabs than the first cycle. I filed a bug about this behavior against talos. It does not matter now, but if we ever decide to change the default preference in Firefox to load your windows and tabs from last time this will come back to bite us.
I did learn something useful out of all of this though: startup time scales linearly with the number of tabs session restore has to restore. I confirmed this by running talos with 200 cycles instead of the normal 20, and it was clearly a linear increase. We should probably figure out a way to mitigate that, but I have not filed a bug on it (yet!).
Last week I landed bug 485976 which moves the writing and subsequent fsync (or flush on windows) call to a background thread. This should benefit all of our users, especially those with slower hard drives. Paul O’Shannessey has filed another bug that will reduce the amount of disk activity substantially more that will benefit our users even more.
Session restore writes out to disk very frequently – every ten seconds, in fact. This behavior is controllable by the preference
browser.sessionstore.interval for those who want to reduce that, but then you run the risk of not having all your data saved if you crash. We really don’t want to reduce that time for our users.
The amount of data that is written out to disk by session restore scales linearly with the number of tabs and windows you have open. The more you have, the more data has to be written out to disk, and the longer it is going to take.
As we learned in the past with Places, writing to disk and calling fsync can be painfully slow. In session restore code, we are doing this very often and on the main thread. Clearly, this is a bad thing.
Process and Solution
This section is a bit technical, so feel free to skip it. The short answer is “do not block the main thread while writing and flushing data to the hard drive.”
We wanted to address this problem as much as we could for Firefox 3.6. In order to actually reduce the number of writes and fsync calls, we would have to heavily modify how session restore manages and writes its data. That is a big change that we were not comfortable doing this late in the 3.6 cycle. On top of that, we do not really have the manpower to do that change since the people who know that code well are working on other performance improvements for this release. The simple solution for now then is to move our write and fsync calls off of the main thread.
Luckily, Boris Zbarsky had recently written a new API for JS consumers to asynchronously copy an input stream to an output stream. This API would work great for session restore! We had to fix one minor issue with the underlying code not properly handling nsISafeOutputStreams (which make sure we fsync properly), but once that was done, the fix was incredibly simple.