Test Build: Asynchronous Location Bar Searches

A few months ago I decided to try to use the asynchronous storage API that was added in Firefox 3.1 to help reduce the pain of disk IO on the main thread. Sadly, it became quite apparent that this was going to be too big of a change and need to much work to make it into 3.1, so I put off doing any more work on it. However, this week I started working on the patch again, updating it to work with the changes to the location bar and the storage back-end. Today I finally got it passing all of our existing tests (although, I know of at least one condition where it fails and is untested).

Now that it’s passing all tests, I feel comfortable posting a test build for folks to try and see if it helps or not. I should note that the current implementation is pretty dumb and doesn’t take many opportunities speed up results. Additionally, there are some other performance wins that are on my mind that become a lot easier to do with this newer implementation.

Admittedly, I haven’t benchmarked this yet, so I don’t know how it compares to the existing code. During causal use, however, it feels no slower than the existing implementation, but I don’t usually have issues with it. The goal here is to help out those who do have performance issues with the location bar. In fact, that’s exactly the feedback I’m looking to get. So, if you are feeling ambitious and willing to live on the wild side for a bit, I’d like you try this test build. After a little bit of use, let me know if you think the results are faster, slower, or about the same. Note: this is build off of mozilla-central, so it’s like a 3.2a1pre build.

Your feedback is greatly appreciated!

Mozilla Personal

Artistic Blog Representation

I wanted to see what I was writing about looked like after reading KaiRo’s post about his site. So, I jumped on over to Wordle (which sadly uses Java), and generated this:

Click to see full image

Clearly, I write a lot about Mozilla, and as of late, performance has dominated that topic. It’s funny, because some time this week I was going to write another blog post about performance too…

Side note: It’d be really cool if someone made a WordPress widget that generated this.


Performance Regressions are Painful

I think it is a well known fact that performance regressions are really painful. They cause pain on more than one front too! You have users who have a less responsive application, drivers who have to figure out who and what caused the regression, and developers who have to backout or come up with a fix for the regression.

Until recently, we only had a heavy handed tool (the graph server) that is slow and painful to use. Recently, Johnathan has revamped his performance dashboard which is a very quick and easy way for people to see the current status of some of the more important performance graphs (it’s also easy to hack on!). This has made spotting a regression much easier and faster, which great increases the odds of the offending change(s) being backed out. The longer a performance bug is left in the tree, the harder it becomes to do a straight backout.

Today I decided I was going to spend the day eliminating the rest of our open performance bugs (or make sure they had blocking requested for the current release). However, I was amazed at how many old performance bugs we had open that hadn’t been touched in six months or more. I probably closed about 20 bugs as INCOMPLETE since there was virtually no way we were going to be able fix those bugs anymore. One bug I was actually able to mark as FIXED, and there were a few more that were recent regressions that I posted comments on to make sure people were still working on.

This made me realize that we have a serious problem though. We currently have no way for people who care about performance regressions to easily be aware of new bugs filed. To help this, I went ahead and filed bug 467170 which will allow folks to add an e-mail address to the cc list of all performance regression bugs so the folks who care about this can watch the address and get mail about these issues. Once bug 464609 gets resolved the sheriffs will also have a place to bring these issues up so the next sheriff is aware of what is going on as well.

I think we are starting to move in the right direction when it comes to performance monitoring, but I think we also have a long ways to go. Remember kids, only you can help stop performance regressions.


Determining a Ts Regression

For those who have been following the tree status of mozilla-central as of late, you probably noticed that I tried to land SQLite once again, but it was backed out due to a nasty Ts regression on Linux. When I had run this through the try server, it had shown no regression so I had thought it was safe (just like the past three or four other times I’ve tried to land this). Luckily, Johnathan, who was the sheriff when I landed, found a linux box that we could use that reproduced this problem. With a lot of his help, we got standalone talos running just Ts, to get strace logs during startup.

Once I had those logs, I needed some way to parse the files for data so I can use it in a reasonable way. I wrote a python script to parse the strace logs, and then insert them into a sqlite database file (26.8 MB) so I could run some interesting queries on the data.

With that data, I decided to generate some graphs to easily see what was going on. All of these graphs compose the data from the six runs of Firefox that talos ran – the data is all summed up. All the graphs have larger versions available if you click on them.

I figured that the most useful graph for investigating this Ts regression would be execution time:
Total execution time

Note that that is six runs of Firefox, which is why it is as long as it is. Next, I looked at the average execution time for each function call:
Average execution time

And finally, I looked at the number of calls of each of these functions:
Number of calls

We are clearly seeing an increase in the number of fsync calls, and we know that on Linux those can be more painful than they are on other operating systems. My next step is to see if we also see this increase on OS X. If we do, I’m going to assume we see it on windows as well, and get backtraces of every single fsync call to determine why we’ve double the number of calls by upgrading.

I’ll make a new post as more data comes in.