Skip navigation

Tag Archives: session restore

This week, I spent some time looking at some real life profiles that were sent into us by users seeing startup time in the minutes. The tests were ran just like I ran the test on my profile: all add-ons disabled. The results I got are both good and bad, but first the results!

Results

The first shows the raw test run data (which isn’t terribly interesting). The second compares the reported startup time for each test. You will probably want to click to zoom in.

Conclusions

Like I said, the results are both good and bad. Good in that I now have a pretty good idea on why people have bad startup times. Bad in that we don’t have any way to quickly improve the issues that people are seeing. What I see from this data is that profiles in the wild, with add-ons disabled aren’t much slower than a clean profile. This seems to implicate add-ons being at least part of the problem (which we knew) or possibly all of the problem at this point (for the profiles tested). The good news is that the add-ons team is already working on solutions to this, and you should expect some blog posts from them about this soon.

Next Step

Next week I’m going to spend some time getting numbers with these profiles on the latest release of Firefox 3.6 with and without add-ons disabled to compare. This will pretty much confirm or deny my hypothesis of this week’s results.

News on the Past

In my last post, we looked at my profile with various pieces removed to try and figure out why startup might be slow for people. With those results, I identified two issues that would impact startup the most:

  1. Large cookies.sqlite
  2. Many tabs being restored

I also have good news about both of these issues! The cookies.sqlite issue is now fixed and will be a part of beta 4, and Paul has some good data about session restore and tabs (with more to come).

Over the weekend I spent some serious time with my computer running a bunch of tests with standalone talos in 11 different situations. First, a disclaimer: these tests were only designed to give some insight on the areas we should focus on for the goal. Each of these tests was reproduced at least once before I moved onto the next one in order to make sure the numbers were stable.

The Tests

  • Clean profile. This is just the standard profile that we normally run Ts with on tinderbox. This is basically used a baseline for best possible performance.
  • Dirty profile. This is actually my daily profile, with eight tabs that will open through session restore during startup. Because of how talos works, these tabs don’t all have to load for the number to be generated. Even so, you’ll notice a substantial slowdown. Sadly, I fear I modified the profile I was using in a bad way because I can no longer reproduce the numbers I got (but the numbers recorded were reproduced four times before I moved on to the rest of the tests initially).
  • Bookmarks toolbar disabled. This is a variation on the dirty profile test that just disables the bookmarks toolbar.
  • No places. This is a variation on the dirty profile test that removes places files from the profile.
  • No sessionstore.js. This is a variation on the dirty profile test that removes sessionstore.js from the profile. This has the side effect of also not making the eight tabs load at startup.
  • No urlclassifier. This is a variation on the dirty profile test that removes the urlclassifier related files from the profile.
  • No cookies.sqlite. This is a variation on the dirty profile test that removes cookies.sqlite from the profile.
  • No extensions. This is a variation on the dirty profile test that removes all add-on manager bits in the profile.
  • No formhistory.sqlite. This is a variation on the dirty profile test that removes formhistory.sqlite from the profile.
  • No downloads.sqlite. This is a variation on the dirty profile test that removes downloads.sqlite from the profile.
  • No content-prefs.sqlite. This is a variation on the ditry profile test that removes content-prefs.sqlite from the profile.

Results

I’m going to let some graphs do the talking here. The first shows the raw test run data (which isn’t terribly interesting). The second compares the reported startup time for each test. You will probably want to click to zoom in.

Data of the startup time runs

Startup time of the various tests

Conclusions

It looks like the best wins that we can get are related to fixing session restore to not scale linearly with the number of tabs it is restoring, and reduce the startup time costs of loading places files and cookies.sqlite. It should be noted that this test was not measuring the load time for each tab, so something like BarTab would not help in this case. The other good news is that we already have work underway to make cookies.sqlite load time not hurt us so much during startup.

Recently, I started working on one of our Q3 goals to reduce “dirty” profile startup time to be only 20% of normal startup time. It is a pretty big and scary problem, and is probably the hardest problem I have ever worked on for the Mozilla project (I have had some doozies in the past too…). So far I have only spent time trying to get consistent results of startup time with a dirty profile (of the Firefox kind) so I can then compare profiles (of the program profile kind) to a clean profile (of the Firefox kind). For now, I am just using a copy of my own profile to get some data (no, you cannot have a copy of it).

Unfortunately, I immediately hit a bit of a speed bump. I think a picture best explains this:

chart of dirty startup

This is a chart of the 20 runs talos does when measuring startup time. As you can see, the dirty profile kept on increasing each and every run. After about a day of investigating this, the cause is finally known: tabs. Turns out, talos tries to quit the browser before calling window.close(), which results in the tabs not closing. As a result, at the end of the 20 cycles, we were loading 19 more tabs than the first cycle. I filed a bug about this behavior against talos. It does not matter now, but if we ever decide to change the default preference in Firefox to load your windows and tabs from last time this will come back to bite us.

I did learn something useful out of all of this though: startup time scales linearly with the number of tabs session restore has to restore. I confirmed this by running talos with 200 cycles instead of the normal 20, and it was clearly a linear increase. We should probably figure out a way to mitigate that, but I have not filed a bug on it (yet!).

Last week I landed bug 485976 which moves the writing and subsequent fsync (or flush on windows) call to a background thread. This should benefit all of our users, especially those with slower hard drives. Paul O’Shannessey has filed another bug that will reduce the amount of disk activity substantially more that will benefit our users even more.

Background

Session restore writes out to disk very frequently – every ten seconds, in fact. This behavior is controllable by the preference browser.sessionstore.interval for those who want to reduce that, but then you run the risk of not having all your data saved if you crash. We really don’t want to reduce that time for our users.

The amount of data that is written out to disk by session restore scales linearly with the number of tabs and windows you have open. The more you have, the more data has to be written out to disk, and the longer it is going to take.

As we learned in the past with Places, writing to disk and calling fsync can be painfully slow. In session restore code, we are doing this very often and on the main thread. Clearly, this is a bad thing.

Process and Solution

This section is a bit technical, so feel free to skip it. The short answer is “do not block the main thread while writing and flushing data to the hard drive.”

We wanted to address this problem as much as we could for Firefox 3.6. In order to actually reduce the number of writes and fsync calls, we would have to heavily modify how session restore manages and writes its data. That is a big change that we were not comfortable doing this late in the 3.6 cycle. On top of that, we do not really have the manpower to do that change since the people who know that code well are working on other performance improvements for this release. The simple solution for now then is to move our write and fsync calls off of the main thread.

Luckily, Boris Zbarsky had recently written a new API for JS consumers to asynchronously copy an input stream to an output stream. This API would work great for session restore! We had to fix one minor issue with the underlying code not properly handling nsISafeOutputStreams (which make sure we fsync properly), but once that was done, the fix was incredibly simple.