On Unit Testing Requirements

My last post drew quite a few responses and got me thinking even more about our current unit testing policy. Right now the toolkit review requirements (and the browser review requirements) state the following:

  • All patches must include unit tests which are run during make check. Once unit tests are committed, the in-testsuite+ flag should be set on the bug.
  • Unit tests must be included in patches and reviewed just like any other code.
  • If running the test during make check is not possible, consult with Robert Sayre to determine whether another test system is available.
  • If a unit test framework is needed but is not yet available, the developer should write appropriate test code and commit it. A bug should be filed on the needed test framework. The in-testsuite? flag should be set on the bug until the framework has been completed and the test code is running automatically.
  • Certain build-config or tooling bugs which don’t affect the actual product cannot be usefully tested. These bugs should have the in-testsuite- flag set.

I’m quoting that because I don’t think many people (especially based on the comments I got) are even aware of this policy. I suppose that this is the case because nobody’s really enforced it before – it’s been more of a suggestion that people should try to follow. In fact, there are several modules in toolkit and browser that have little to no testing (especially of the UI) that are actively being worked on (I’m not trying to point fingers here). This brings up a good question: “Why isn’t this policy being enforced at least somewhat uniformly?” I think part of the problem is that people feel that it’s a waste of their time, and that it won’t actually bring any benefit (I’ll save the argument about how wrong this viewpoint is, especially when we are trying to ship a release for some other day). Another problem is that this policy, as written, is a bit unrealistic.

I think Rob has the right idea about not requiring tests from new contributors. However, the current policy doesn’t allow for that as an exception, so someone else needs to write the test. One commenter suggested that someone else write the test(s) for the patch submitter. From a patcher’s perspective, that seems like a great idea. However, who would end up writing those test cases? Most likely, that would fall onto the module owner, or if he/she is lucky, one of the peers (I should note that Edward and I have written many tests for various bugs that we didn’t write the patch for, so this does happen sometimes). Most module owners and their peers are often patchers as well. If they have to now write tests for other people’s patches, do reviews in their module, and do patches elsewhere (because nobody who’s a peer or module owner really works in just one module), they are highly likely to put something on the back-burner. Personally, any bug that wasn’t a serious issue that required me to write a test would probably become very low priority (and I suspect that to be the same for most other people). In the end, it’s just shifting the burden to people who are already very busy.

I think it’s fairly clear that we have a policy that isn’t in line with most individual’s expectations. Or maybe we, as submodule owners, aren’t supposed to be so strict about it and I just didn’t get the memo (hey, I’m new to this, so it’s possible). Regardless, I feel that something should be done about the situation.


Being a Responsible Owner

Back in January the Download Manager broke due to a change in the JS engine. However, since the UI had no tests on it, the issue was not caught upon checkin. This led to a patch franticly being created changing nearly all instances of let to var. I was not happy about this since most of the changes were not necessary, and valuable cvs blame data was then lost on the file. Upon brining this up, I got blasted for not having any tests on the UI when I’m clearly supposed to have them as per the toolkit test requirement.

I’m not saying that getting blasted was uncalled for. In fact, it wasn’t. It was more of a wake-up call that I was being an irresponsible owner of my submodule to me. Test driven development is really an awesome model, and it had saved me many times before with the back-end of the download manager (which did/does have fairly extensive unit tests). Having said that, you’d think I would have known better. Maybe it was laziness (writing a test takes a while), or maybe it was lack of knowledge (I only knew about xpcshell unit tests at the time), but really, it was inexcusable.

I’d like to say that I’ve learned my lesson now. While the download manager UI isn’t in a state that I’d consider “well tested” yet, it’s leaps and bounds better than it once was. After I wrote the first test case of the UI, things got a lot easier. Since then, Edward has landed several tests cases (as well as putting up a patch to improve the reliability of our tests), and a few other contributors have added some test cases.

It’s not all fun in the sun now that we have tests though. Based on conversations I’ve had on irc, or quotes that I’ve seen go by on qbo, it seems quite apparent to me that I’ve joined the ranks of those who are jerks about test cases. It’s not that I don’t mind that either – I think it’s the right approach.

However, I’ve noticed a steady decline in morale from contributors to my module (and community members attitudes toward me as well). Two bugs, in particular, come to mind. With the first case, I’ve gone so far as to explain exactly what should be tested, although maybe in not enough detail. I’ve seen comments go by on irc that what I’m asking for is “unreasonable”, especially because it was just a typo that caused the regression. Sadly, I fear that the people advocating this just don’t understand the test-driven development model. Clearly this was something that can be easily regressed if just a minor typo caused it, so it should have a test to make sure something as trivial as a typo doesn’t cause it again. I mean, it only took us a few days to track the issue down to this bug, and figure out what was wrong. Another issue brought up with that bug is that the test would be “unreliable”. To me, that’s just an excuse to not write the test though – I don’t see how it’s unreliable in any way. I’m sure that was a great use of everyone’s time, and that they’d all love to do it again in the future. Or, we can have it tested and never have to worry about it again. The second case isn’t nearly as controversial, but I fear that I’ve scared a long-time contributor away simple because he wanted to change the behavior of untested code that I want tests on before it’s changed.

You might notice that I sounded a bit angry in that last paragraph. I have to say that it’s quite frustrating to see so much resistance to the test driven model of development. It’s proven to work in lots of projects. Sure, writing tests can often take much longer than fixing the bug (it took me about twice as long to get the initial unit tests working for the download manager back-end than it did to do the rewrite), but the amount of time it saves you down the line is immeasurable. The sooner people realize this, the better off our entire codebase will be. Yeah, I’ve been making the problem worse by getting more and more strict about the test requirement. However, it’s the right thing to do.

Reflecting on this has brought up at least one issue. How do we require tests without scaring contributors away? I rather like how bsmedberg asked me to add a test for nsIMutableArray the other day on irc by suggesting that if I depended on a behavior, I add a test for that behavior. This doesn’t really work in my submodule though since the tests are required for the code to land. It’s an issue that should probably be looked into.

EDIT: I didn’t post this last night because in the back of my mind I knew my tone was too harsh, so I slept on it. After waking up, fixing it, and reading a particular post on planet, I see that I was in fact correct about there being frustrations with contributors. It’s unfortunate, and I don’t really know how to fix the problem yet.

I figure I should spruce this negative entry up by citing two cases where random outside contributors came through with great tests without hesitation – bug 333848 by Lev Serebryakov, and bug 403068 by John Zhang from Aptana.