Back in December when I was working on the Places branch, I was using Xperf quite a bit to try and figure out why we were regressing certain performance tests. Xperf is an incredibly powerful tool, but it’s really hard to get it to do what you want sometimes. This is largely due to the fact that the documentation isn’t great, and there appears to be wrong information on the Internet (from blogs, which tend to be more useful than the documentation).
I cared about getting information about hard faults, file I/O, and disk I/O to see if patterns changed with my work. In order to accomplish this, I started Xpef like this (from an admin console):
xperf -start "NT Kernel Logger" -on PROC_THREAD+LOADER+HARD_FAULTS+FILE_IO+FILE_IO_INIT+DISK_IO -stackwalk FileWrite+FileRead+FileFlush -MaxBuffers 1024 -BufferSize 1024 -f output.etl
The first list options after
-on is a list of providers. To see the list of installed kernel providers on your system and what they do, open up a command prompt, and run
xperf -providers K.
The options after
-stackwalk tells Xperf to get call stacks for certain events. The full list of events supported can be found here. I found it very difficult to get stacks, and if you are on a 64-bit version of windows, you have an extra hoop to jump through. I was actually never able to get stacks from an optimized build with symbols, so I ended up just using debug builds when I needed stacks, and opt builds when I wanted good numbers for everything else.
-BufferSize arguments were useful to prevent events from being dropped (if Xperf doesn’t have enough memory set aside to record an event, it just drops it). You can tweak those values to your needs, but the values I used should be fine.
Once I ran that command in my console, I’d perform the test I wanted to get data on. Upon completion, Xperf needs to be told to stop, and then have it merge (merging may be unnecessary when you only use kernel providers, but I never tested this theory):
xperf -stop "NT Kernel Logger"
xperf -merge output.etl output_final.etl
You can now open
output_final.etl to examine the data you just gathered!
Last week I landed bug 485976 which moves the writing and subsequent fsync (or flush on windows) call to a background thread. This should benefit all of our users, especially those with slower hard drives. Paul O’Shannessey has filed another bug that will reduce the amount of disk activity substantially more that will benefit our users even more.
Session restore writes out to disk very frequently – every ten seconds, in fact. This behavior is controllable by the preference
browser.sessionstore.interval for those who want to reduce that, but then you run the risk of not having all your data saved if you crash. We really don’t want to reduce that time for our users.
The amount of data that is written out to disk by session restore scales linearly with the number of tabs and windows you have open. The more you have, the more data has to be written out to disk, and the longer it is going to take.
As we learned in the past with Places, writing to disk and calling fsync can be painfully slow. In session restore code, we are doing this very often and on the main thread. Clearly, this is a bad thing.
Process and Solution
This section is a bit technical, so feel free to skip it. The short answer is “do not block the main thread while writing and flushing data to the hard drive.”
We wanted to address this problem as much as we could for Firefox 3.6. In order to actually reduce the number of writes and fsync calls, we would have to heavily modify how session restore manages and writes its data. That is a big change that we were not comfortable doing this late in the 3.6 cycle. On top of that, we do not really have the manpower to do that change since the people who know that code well are working on other performance improvements for this release. The simple solution for now then is to move our write and fsync calls off of the main thread.
Luckily, Boris Zbarsky had recently written a new API for JS consumers to asynchronously copy an input stream to an output stream. This API would work great for session restore! We had to fix one minor issue with the underlying code not properly handling nsISafeOutputStreams (which make sure we fsync properly), but once that was done, the fix was incredibly simple.
As you may or may not be aware, my personal mission as of late is to reduce the number of writes and fsyncs that Firefox makes, and move the ones that we do have to make off of the main thread. The primary target here has been Places, and the work is still continuing.
The Firefox team has been focusing on code sprints to get some small well scoped things done for Firefox 3.1 since we’ve got a bit more time. My latest sprint can be found over in bug 480211, where I’ve removed a write and fsync that we used to do after every page visited. If we had enough pages in history that were old enough, we would remove them from history. We now do this off of the main thread, asynchronously at the same time we flush data from our temporary tables to our permanent ones. The net result is the same number of writes and one less fsync. Additionally, the write is no longer done on the main thread.
Sadly, I couldn’t measure any real-world performance gains with my DTrace scripts – in fact I saw no change during several different runs of Tp3 with various places.sqlite files. It’s quite possible I did not have the conditions setup correctly to have pages expiring, and I could have spent a few more hours generating just the right places.sqlite file to demonstrate wins in the real world, but the theory behind the patch is pretty simple. The gain is pretty obvious.
Just another drop in the bucket of performance wins for Firefox. Stay tuned, as there is more to come!
Yesterday dietrich was telling me he was seeing a lot of writes to
places.sqlite-journal. I wanted to get good, hard data to see what was writing, and how often we were doing it. I figured this was a good option for DTrace, but I've had mixed experiences with it in the past. I first turned to Instruments on OS X, but that can't give you stacks for calls, so I had to dump it.
I talked to dolske, who happens to be the resident DTrace expert around here. With his help, I was able to put together this little D script to track writes to the files in question, and give me user stack traces so we know who is writing and when. With this, we've figured out what was writing, and are working on how to make it write less as we speak.
DTrace really is an awesome tool, even if it can be a bit awkward to use from time to time.
This is part four in a continuing series about how we are working around the slow fsync issue in Mozilla. Part one can be found here, part two here, and part three here.
Boy, some of the initial planning to solve this problem has changed. I found several bugs in my triggers, which made my life not so fun. Now if you google the word fsync, my website is the first result returned (scary!). Never fear, however! I have good news. The places team has put together an experimental build that should help greatly. Before I link to the build though, I have a warning:
Do not use your normal profile – crate a copy of it to test this build. This build will modify the places database schema, and if you go back to a normal build, you will experience some strange behavior.
With that out of the way, the builds can be found here and will be available for the next seven days.
Feedback is not only welcomed, but wanted. You can track the overall progress of this task in bug 442967