Categories
Mozilla

Session Restore Now Writes to Disk Off of the Main Thread

Last week I landed bug 485976 which moves the writing and subsequent fsync (or flush on windows) call to a background thread. This should benefit all of our users, especially those with slower hard drives. Paul O’Shannessey has filed another bug that will reduce the amount of disk activity substantially more that will benefit our users even more.

Background

Session restore writes out to disk very frequently – every ten seconds, in fact. This behavior is controllable by the preference browser.sessionstore.interval for those who want to reduce that, but then you run the risk of not having all your data saved if you crash. We really don’t want to reduce that time for our users.

The amount of data that is written out to disk by session restore scales linearly with the number of tabs and windows you have open. The more you have, the more data has to be written out to disk, and the longer it is going to take.

As we learned in the past with Places, writing to disk and calling fsync can be painfully slow. In session restore code, we are doing this very often and on the main thread. Clearly, this is a bad thing.

Process and Solution

This section is a bit technical, so feel free to skip it. The short answer is “do not block the main thread while writing and flushing data to the hard drive.”

We wanted to address this problem as much as we could for Firefox 3.6. In order to actually reduce the number of writes and fsync calls, we would have to heavily modify how session restore manages and writes its data. That is a big change that we were not comfortable doing this late in the 3.6 cycle. On top of that, we do not really have the manpower to do that change since the people who know that code well are working on other performance improvements for this release. The simple solution for now then is to move our write and fsync calls off of the main thread.

Luckily, Boris Zbarsky had recently written a new API for JS consumers to asynchronously copy an input stream to an output stream. This API would work great for session restore! We had to fix one minor issue with the underlying code not properly handling nsISafeOutputStreams (which make sure we fsync properly), but once that was done, the fix was incredibly simple.

Categories
Mozilla

Asynchronous Location Bar has Landed

About two weeks ago the asynchronous location bar work landed in mozilla-central without much issue. It’s also in the Firefox 3.6 alpha we just recently released. This has the potential to impact all of our users, but those on slower hard drives will notice this the most. Your location bar searches may not complete any faster than before, but they certainly won’t be hanging your browser and locking up the UI.

Background

We’ve been getting reports for some time about the location bar hanging the application for some users when they are typing in it. This wasn’t a problem that was reproducible on every machine, and even on machines that saw it, it wasn’t always 100% reproducible. Clearly, this behavior is not desirable, so we set out to fix it.

I had a theory to the cause almost a year ago and filed a bug that I was hoping we could work on and fix for Firefox 3.5. We knew that reading data off a disk can be slow (and certainly would complete in a non-deterministic amount of time). Since SQLite uses blocking read calls (no more code can execute until the data is read from disk), this could certainly be the cause of the slowdown our users were seeing. Some simple profiling showed that this was largely the cause of the hanging. Work began on the project, but it was clear that enough issues were cropping up that we were not going to be able to safely take this change for Firefox 3.5, and resources were diverted elsewhere.

Process and Solution

This section is a bit technical, so feel free to skip it. The short answer is “do not block the main thread while reading from the hard drive.”

In order to not block the main thread while reading from disk we either need to make SQLite use non-blocking read system calls, or call into SQLite off of the main thread. Changing the SQLite code isn’t something we want to do, so that solution was out of the question. Luckily, we had solved a similar problem with writes and fsyncs earlier in the Firefox 3.5 development with the asynchronous Storage API.

The first implementation that we tried essentially did the same thing that the old code did. We would execute a query, but this time asynchronously, and then process the results and see if they match. There were two issues with this approach, however. The first issue was that we were filtering every history and bookmark entry on the main thread for a given search. That could be a lot of work we end up doing, and with the additional overhead of moving data across threads, the common case would see no win. The second issue was that once we selected a result in the location bar, and a search was not yet complete, there would be a hang as the main thread processed a bunch of events that Storage had posted to it containing results.

At this point, we realized we needed to do the filtering on a thread other than the main thread. After some thought, we was figured that the easiest way to do that would be to use a SQL function that we define in the WHERE clause of our autocomplete queries. This way, all the filtering is done on a background thread, and the code that runs on the main thread only deals with results we will actually use. This solution exposed some things in the Storage backend like lock contention and a few other subtle issues, but nothing major came up.

For more details on how the location bar search results are generated, see my explanation here.

If you weren’t having a problem before, chances are you won’t notice any difference at all.

Categories
Mozilla

Test Build: Asynchronous Location Bar (Take >2)

I have another test build for folks to try out. This fixes a possible error condition that could happen in certain circumstances. This build has two known issues:

  1. There is a lot of flickering when new results show up. This is being tracked in bug 393902.
  2. Your computer will hang for a period of time (it will become responsive again) if you continue to type once no results are found. This is being tracked in bug 503701.

This is built off of a “stable” point of mozilla-central, so it’s like using a 3.6 nightly. All the normal warnings apply about using it. I’m told this greatly increases the speed at which results obtained by many people. If you experience any issues (other than the two listed here), please let me know! The builds can be found here.

Categories
Mozilla

Asynchronous Location Bar Searches

For those of you using the asynchronous location bar add-on, today’s nightly of both mozilla-central and mozilla-1.9.1 should show a reduction in memory use. Turns out there was a bit of a leak in mozStorage that would mean you’d get high memory usage that would never go down. My bad.

For those of your programmers out there, I have a word of advice. Always make sure you have a virtual destructor in your base class. If you don’t, you could spend days tracking down a leak that doesn’t seem to make sense. Of course, this probably would have been spotted earlier if we reported leaks in xpcshell unit tests.

Categories
Mozilla

More fsync and write Reduction

As you may or may not be aware, my personal mission as of late is to reduce the number of writes and fsyncs that Firefox makes, and move the ones that we do have to make off of the main thread. The primary target here has been Places, and the work is still continuing.

The Firefox team has been focusing on code sprints to get some small well scoped things done for Firefox 3.1 since we’ve got a bit more time. My latest sprint can be found over in bug 480211, where I’ve removed a write and fsync that we used to do after every page visited. If we had enough pages in history that were old enough, we would remove them from history. We now do this off of the main thread, asynchronously at the same time we flush data from our temporary tables to our permanent ones. The net result is the same number of writes and one less fsync. Additionally, the write is no longer done on the main thread.

Sadly, I couldn’t measure any real-world performance gains with my DTrace scripts – in fact I saw no change during several different runs of Tp3 with various places.sqlite files. It’s quite possible I did not have the conditions setup correctly to have pages expiring, and I could have spent a few more hours generating just the right places.sqlite file to demonstrate wins in the real world, but the theory behind the patch is pretty simple. The gain is pretty obvious.

Just another drop in the bucket of performance wins for Firefox. Stay tuned, as there is more to come!