Skip navigation

Tag Archives: asynchronous

Sometime soon after the beta 8 code freeze, the Places team will be merging the Places branch into mozilla-central. There are a lot of changes we’ve been working on, the most important of which is some major re-architecting how we store data.

The Benefits

The work on the Places branch brings us a number of benefits. In general, we’ve parallelized work, and made it substantially less likely that we’ll block on the GUI thread. Some of the important fixes we have landed are:

  • Faster Location Bar
    The location bar is faster because other database work no longer blocks us from searching, and the queries are much simpler.
  • Asynchronous Bookmark Notifications
    Indicating if the current page is bookmarked in the location bar (with the star) is now an asynchronous operation that does not block the page load.
  • Faster Bookmarks & History Management/Searches
    Simpler queries and other improvements should make this all work faster.
  • Faster Link Coloring
    Link coloring is now executed on a separate database connection so it cannot block other database work.
  • Expiration Work
    Less work at startup, less work at shutdown, and less work when we run expiration.
  • Less Data Stored
    Embedded pages are now tracked only in memory and never hit the disk.
  • Better Battery Management
    Much less work during idle time, which will improve our power consumption behaviors.
  • Fixes 29 blockers and 18 other issues

A bit of History

Way back in the days leading up to Firefox 3.5, we moved from storing all of our history and location data in disk tables to in-memory tables that we’d flush out to disk every two minutes off of the GUI thread. The benefit of this was two-fold:

  1. No longer performing the vast majority of our disk writes on the GUI thread
  2. No longer performing the vast majority of our fsyncs/Flushes on the GUI thread

More details about how we came up with this solution can be found in a series of blog posts.

The Problem

This solution has worked out pretty well for us for a while, but recently, especially on OS X, it has not been. The short story is that our architecture did not scale well due to lock contention between our GUI thread and our background I/O thread. While the common case access case may be fine, the failure case (when we hit lock contention) is pretty terrible. The problem is so terrible that I once described it like this:

the failure case makes us fall on our faces, skid about 100 feet, and then fall off a cliff without a parachute.

Ultimately, the only way we can avoid this situation is to not do any database work on separate threads with the same database connection. It was not an issue in the past because we just did not do enough work on the I/O thread, but as we have added to the workload of that thread, we increase the likelihood of it holding the lock, which means there is a higher probability that the GUI thread will not be able to instantly acquire the lock and do whatever it needs to do. This essentially leaves us with two options:

  1. Move the rest of our database work off of the GUI thread.
  2. Move database work from the I/O thread back to the GUI thread.

The Solution

The second choice is not actually a viable option. Disk I/O completes in a non-deterministic amount of time, which is why we have been moving it from the GUI thread to an I/O thread since Firefox 3.5. The first choice is not entirely viable either due to schedule constraints either (we have tons of API calls that are not used heavily but still synchronous). A hybrid solution exists, however. We can reduce the amount of work we do on the I/O thread by using additional I/O threads. Additionally, we can move the remaining synchronous operations during browsing to an I/O thread. In the end, Places ends up with one read/write thread, and multiple read-only threads.

This wasn’t really an option back in the Firefox 3.5 days because in SQLite readers and writers blocked each other. However, the SQLite developers recently devised a new journaling method called WAL that lets readers not block writers, and writers not block readers. When the Places branch merges into mozilla-central, we will end up with three read-only I/O threads and our original read-write I/O thread. The three read-only threads are used for location bar searches, visited checks (is a given hyperlink visited), and some bookmark operations. Each I/O thread has its own connection to the database, allowing operations to happen in parallel (SQLite is only threadsafe because it serializes all access on each connection object, which is why we had the lock contention in the first place).

Performance Test Issues

One of the things that made this work especially difficult is seemingly random changes in performance numbers. We often had regressions suddenly appear (according to talos) on changesets that would have zero impact on performance, and then backing out the change would cause an additional regression. Other times, when we would merge mozilla-central into Places, we would suddenly get new regressions when comparing to mozilla-central. This could be indicative of a bad interaction with our code and the changes on mozilla-central, however after looking at the changes on mozilla-central that landed with the merge, that appeared to be highly unlikely.

I’m also quite certain that some of our performance tests do not actually test/measure what we actually want to test/measure. I’ll leave that discussion to a future blog posts, however.

Recently, a few bugs have landed enabling a bunch of nice things for consumers of NetUtil.jsm:

  • NetUtil.newURI can take a string (plus optional character set and base URI) or an nsIFile.
  • A new method for creating channels has been created. NetUtil.newChannel can take an nsIURI, a string (plus optional character set and base URI), or an nsIFile.
  • NetUtil.asyncFetch can take an nsIChannel, an nsIURI, a string (plus optional character set and base URI), or an nsIFile.

This means, among other things, that it now requires less code to read a file asynchronously than it does synchronously. The old way to do this asynchronously can be seen here on MDC. This would give the consumer a byte array of the data in the file. Compared to the synchronous case, which can be seen here. Both are pretty verbose and clunky to use. The new way looks like this:


NetUtil.asyncFetch(file, function(aInputStream, aResult) {
  if (!Components.isSuccessCode(aResult)) {
    // Handle Error
    return;
  }
  // Consume input stream
});

One function call, with a callback passed in. There is a slight difference from the old asynchronous method, however. NetUtil.asyncFetch gives the consumer an nsIInputStream instead of a byte array. The input stream is a bit more useful than a raw byte array, although it can be painful to use in JavaScript at times (maybe we need an easy method to convert an input stream to a string?). I look forward to patches using this method to read files instead of doing it synchronously.

I just uploaded Bugzilla Helper 0.2.0. This improves on the last release by making making the submission of comments an asynchronous operation. It also uses the activity manager in Thunderbird to track the process of the submission, and retry it if an error occurs.

There are still some apparent issues with the REST API that the add-on is using, and I’ll likely include some workaround in upcoming versions. 0.2.0 is available on addons.mozilla.org and is a recommended upgrade. Current users will have to update since sandboxed add-ons do not automatically update.

Last week I landed bug 485976 which moves the writing and subsequent fsync (or flush on windows) call to a background thread. This should benefit all of our users, especially those with slower hard drives. Paul O’Shannessey has filed another bug that will reduce the amount of disk activity substantially more that will benefit our users even more.

Background

Session restore writes out to disk very frequently – every ten seconds, in fact. This behavior is controllable by the preference browser.sessionstore.interval for those who want to reduce that, but then you run the risk of not having all your data saved if you crash. We really don’t want to reduce that time for our users.

The amount of data that is written out to disk by session restore scales linearly with the number of tabs and windows you have open. The more you have, the more data has to be written out to disk, and the longer it is going to take.

As we learned in the past with Places, writing to disk and calling fsync can be painfully slow. In session restore code, we are doing this very often and on the main thread. Clearly, this is a bad thing.

Process and Solution

This section is a bit technical, so feel free to skip it. The short answer is “do not block the main thread while writing and flushing data to the hard drive.”

We wanted to address this problem as much as we could for Firefox 3.6. In order to actually reduce the number of writes and fsync calls, we would have to heavily modify how session restore manages and writes its data. That is a big change that we were not comfortable doing this late in the 3.6 cycle. On top of that, we do not really have the manpower to do that change since the people who know that code well are working on other performance improvements for this release. The simple solution for now then is to move our write and fsync calls off of the main thread.

Luckily, Boris Zbarsky had recently written a new API for JS consumers to asynchronously copy an input stream to an output stream. This API would work great for session restore! We had to fix one minor issue with the underlying code not properly handling nsISafeOutputStreams (which make sure we fsync properly), but once that was done, the fix was incredibly simple.

About two weeks ago the asynchronous location bar work landed in mozilla-central without much issue. It’s also in the Firefox 3.6 alpha we just recently released. This has the potential to impact all of our users, but those on slower hard drives will notice this the most. Your location bar searches may not complete any faster than before, but they certainly won’t be hanging your browser and locking up the UI.

Background

We’ve been getting reports for some time about the location bar hanging the application for some users when they are typing in it. This wasn’t a problem that was reproducible on every machine, and even on machines that saw it, it wasn’t always 100% reproducible. Clearly, this behavior is not desirable, so we set out to fix it.

I had a theory to the cause almost a year ago and filed a bug that I was hoping we could work on and fix for Firefox 3.5. We knew that reading data off a disk can be slow (and certainly would complete in a non-deterministic amount of time). Since SQLite uses blocking read calls (no more code can execute until the data is read from disk), this could certainly be the cause of the slowdown our users were seeing. Some simple profiling showed that this was largely the cause of the hanging. Work began on the project, but it was clear that enough issues were cropping up that we were not going to be able to safely take this change for Firefox 3.5, and resources were diverted elsewhere.

Process and Solution

This section is a bit technical, so feel free to skip it. The short answer is “do not block the main thread while reading from the hard drive.”

In order to not block the main thread while reading from disk we either need to make SQLite use non-blocking read system calls, or call into SQLite off of the main thread. Changing the SQLite code isn’t something we want to do, so that solution was out of the question. Luckily, we had solved a similar problem with writes and fsyncs earlier in the Firefox 3.5 development with the asynchronous Storage API.

The first implementation that we tried essentially did the same thing that the old code did. We would execute a query, but this time asynchronously, and then process the results and see if they match. There were two issues with this approach, however. The first issue was that we were filtering every history and bookmark entry on the main thread for a given search. That could be a lot of work we end up doing, and with the additional overhead of moving data across threads, the common case would see no win. The second issue was that once we selected a result in the location bar, and a search was not yet complete, there would be a hang as the main thread processed a bunch of events that Storage had posted to it containing results.

At this point, we realized we needed to do the filtering on a thread other than the main thread. After some thought, we was figured that the easiest way to do that would be to use a SQL function that we define in the WHERE clause of our autocomplete queries. This way, all the filtering is done on a background thread, and the code that runs on the main thread only deals with results we will actually use. This solution exposed some things in the Storage backend like lock contention and a few other subtle issues, but nothing major came up.

For more details on how the location bar search results are generated, see my explanation here.

If you weren’t having a problem before, chances are you won’t notice any difference at all.