If you’ve been experiencing crashes when trying to delete a large set of history, I have great news for you! We’ve identified the issue, and a fix will be coming to you shortly!
Background
This fix wouldn’t be in our hands if it was for the recent CrashKill effort here at Mozilla. Sam Sidler gave me a link to a list of crashes that were reported to us during a 24 hour period in SQLite and mozStorage code. I immediately started adding links to crash reports and filing bugs to some of the more frequent crashes so we could start tracking these issues. I also informed the SQLite team about this page in case they found the data useful, and they sure did. They started to look into this particular crash. It turns out that there are two other signatures that are actually the same crash, but show up differently due to compiler optimizations causing differences in the generated program.
Fix Details
SQLite has a virtual machine that processes queries. The virtual machine is register-based, which is different from the usual stack-based virtual machines you might encounter with Java. This particular crash would surface when a SQL query used an IN operator with more than ~32 thousand entries on its right-hand side and the EXISTS expression. If either of these conditions occurred on their own, SQLite could handle this just fine. The IN operator stores each entry in a register, and when processing the EXISTS expression, SQLite would store the number of the register in which an expression was written into into a 16-bit integer on an Expr object. However, once you try to write a number greater than 32,767 to a 16-bit integer, you get overflow and you will likely end up with a negative number. This negative number was then used to index into an array, which is all sorts of bad (and led to the crash).
The workaround fix is to just use a 32-bit integer for now. This means this issue will not happen unless you have over 2,147,483,647 entries on the right-hand side of the IN operator. You’d hit other limitations of SQLite before that would ever happen, so overflow is no longer a concern. The SQLite team is going to do a more complicated fix which stores the value separately from Expr object it was previously stored in. The Expr object is used a lot, so they don’t want a long-term solution that increases memory usage.
You might be surprised that such a small and trivial seeming bug got through. This type of bug is more common than you might think, and overflow bugs have even caused the destruction of an Ariane 5 rocket. It’s an unfortunate reality of software that sometimes these edge cases get missed, but we must always try to remain vigilant.
Special thanks to the SQLite team for investigating this fix, and providing information for this blog post.
Provide anonymous feedback on this post with Rypple.