Back in January the Download Manager broke due to a change in the JS engine. However, since the UI had no tests on it, the issue was not caught upon checkin. This led to a patch franticly being created changing nearly all instances of
var. I was not happy about this since most of the changes were not necessary, and valuable cvs blame data was then lost on the file. Upon brining this up, I got blasted for not having any tests on the UI when I’m clearly supposed to have them as per the toolkit test requirement.
I’m not saying that getting blasted was uncalled for. In fact, it wasn’t. It was more of a wake-up call that I was being an irresponsible owner of my submodule to me. Test driven development is really an awesome model, and it had saved me many times before with the back-end of the download manager (which did/does have fairly extensive unit tests). Having said that, you’d think I would have known better. Maybe it was laziness (writing a test takes a while), or maybe it was lack of knowledge (I only knew about xpcshell unit tests at the time), but really, it was inexcusable.
I’d like to say that I’ve learned my lesson now. While the download manager UI isn’t in a state that I’d consider “well tested” yet, it’s leaps and bounds better than it once was. After I wrote the first test case of the UI, things got a lot easier. Since then, Edward has landed several tests cases (as well as putting up a patch to improve the reliability of our tests), and a few other contributors have added some test cases.
It’s not all fun in the sun now that we have tests though. Based on conversations I’ve had on irc, or quotes that I’ve seen go by on qbo, it seems quite apparent to me that I’ve joined the ranks of those who are jerks about test cases. It’s not that I don’t mind that either – I think it’s the right approach.
However, I’ve noticed a steady decline in morale from contributors to my module (and community members attitudes toward me as well). Two bugs, in particular, come to mind. With the first case, I’ve gone so far as to explain exactly what should be tested, although maybe in not enough detail. I’ve seen comments go by on irc that what I’m asking for is “unreasonable”, especially because it was just a typo that caused the regression. Sadly, I fear that the people advocating this just don’t understand the test-driven development model. Clearly this was something that can be easily regressed if just a minor typo caused it, so it should have a test to make sure something as trivial as a typo doesn’t cause it again. I mean, it only took us a few days to track the issue down to this bug, and figure out what was wrong. Another issue brought up with that bug is that the test would be “unreliable”. To me, that’s just an excuse to not write the test though – I don’t see how it’s unreliable in any way. I’m sure that was a great use of everyone’s time, and that they’d all love to do it again in the future. Or, we can have it tested and never have to worry about it again. The second case isn’t nearly as controversial, but I fear that I’ve scared a long-time contributor away simple because he wanted to change the behavior of untested code that I want tests on before it’s changed.
You might notice that I sounded a bit angry in that last paragraph. I have to say that it’s quite frustrating to see so much resistance to the test driven model of development. It’s proven to work in lots of projects. Sure, writing tests can often take much longer than fixing the bug (it took me about twice as long to get the initial unit tests working for the download manager back-end than it did to do the rewrite), but the amount of time it saves you down the line is immeasurable. The sooner people realize this, the better off our entire codebase will be. Yeah, I’ve been making the problem worse by getting more and more strict about the test requirement. However, it’s the right thing to do.
Reflecting on this has brought up at least one issue. How do we require tests without scaring contributors away? I rather like how bsmedberg asked me to add a test for nsIMutableArray the other day on irc by suggesting that if I depended on a behavior, I add a test for that behavior. This doesn’t really work in my submodule though since the tests are required for the code to land. It’s an issue that should probably be looked into.
EDIT: I didn’t post this last night because in the back of my mind I knew my tone was too harsh, so I slept on it. After waking up, fixing it, and reading a particular post on planet, I see that I was in fact correct about there being frustrations with contributors. It’s unfortunate, and I don’t really know how to fix the problem yet.
I figure I should spruce this negative entry up by citing two cases where random outside contributors came through with great tests without hesitation – bug 333848 by Lev Serebryakov, and bug 403068 by John Zhang from Aptana.
16 replies on “Being a Responsible Owner”
Not that I’m actually contributing code myself, but… I guess the painful part here is the catching up, and the release crunch.
“the amount of time it saves you down the line is immeasurable”
Maybe, but if it takes more time now, that’s less stuff that will happen before Firefox 3, and that is a measurable. Saving an unknown amount of time that will benefit Firefox 4 in 18 months or so doesn’t sound so appealing. (I’m not saying that isn’t actually better, just that a immediate and measurable gain right now is always going to make people feel better than a less clear gain later…)
I understand where your argument is coming from, but I have to disagree. Those download manager UI tests – even though they were only added in January, have stopped at least two regressions in my own code from landing (and they probably would have sneaked past review because they turned out to be subtle issues). This all happened in the “release crunch”.
A good example of this is drag and drop in the Bookmark Organizer. That’s been regressed several times in the past several months. All of those regressions could have been avoided if tests had been created the first time the problem arose.
You brought up the fundamental issue here – people think it won’t save them any time in the immediate future, but that’s really a horrible assumption to make.
I’m not involved in coding mozilla, so I could be way off here, but it sounds to me like the problem is not the fact that test cases need to be written, but that so many need to be written and (reading Edward Lee’s post) that the new tests required may not even relate to the changes being checked in (because you’re asking for tests that should already exist).
Perhaps its worth easing up on the test requirements – require tests of any new or changed functionality, but only ask for a couple for the functionality that is already there. You’ll still be increasing the number of tests, but hopefully not pissing off contributors.
I actually am only requiring tests for code that is changing. It just so happens that his code extends the existing code path, so both the current functionality and the new functionality need to be tested.
As an owner, I’m only requiring the minimum amount of tests that make me feel more confident when reviewing the code. Anything extra is my responsibility as far as I’m concerned, and I have been going through and adding tests to bugs marked
in-testsuite=?. If there are no tests landing with it, that means I not only have to apply the patch and test it extensively myself when reviewing, I also have to make sure nobody regresses that behavior down the line. That’s a hard thing to do, which is why the tests are so useful.
In that case it sounds like you don’t have many options – the tests are important but writing them isn’t much fun and doesn’t feel productive.
I was going to suggest faster reviews might help (I’ve seen too many patches hang around for weeks or months without review). But I see from bug 425753 that you reviewed it after 5 hours, which is pretty fast.
You’ve brought up another issue, which is only somewhat related, but a good point. There are modules that have a very small number of people who can do the reviews. On top of that, those people often have their own work, and many times other modules to worry about as well.
The problem in my case is that I don’t (or didn’t) know how to write tests and what to test for; so I was completely stuck for some time when you asked for a test. Add on top of that my own crunch time because I am in the middle of the written tests for an hiring contest, and you get a non written test which blocks a patch, even if I swear I didn’t think the test was too stupid for me to write it.
I realize my post didn’t make this clear – I’m not upset with you about the patch and lack of tests. In fact, I didn’t hear you complain – you just asked for clarification on what needed to be tested. I’m upset about other people who were indicated that I was being unreasonable for asking for a test because they clearly don’t get it.
Shawn: Requiring tests raises the bar of entry pretty high in many cases. I don’t think you’ll get much argument from anyone that tests are good, but I think it is important to balance adding tests with progress. I would suggest that you as a module owner attempt to get people to build tests separately from their patches to get the additional coverage you want. We also have an amazing testing community who while isn’t quite as fast as our automated tests, is pretty close. I don’t see a lot of harm in letting them help to find regressions. Certainly if you keep hitting the same problem over and over again automated tests will be valuable, but it isn’t clear from your post or Edward’s that this is the case.
separately from their patches in the sense of get tests that cover what is there currently, not that all tests for new code/changes/areas should be done separately from the bug they’re fixing
I think putting pressure on volunteers to write tests when their good will doesn’t extend that far is very unwise – at least in the long run. Right now, during the run up to Firefox 3, it makes some sense. We need to do everything possible to make sure fixes don’t introduce worse bugs. In the long run we should be willing to accept what people are willing to contribute though. Writing tests should be strongly encouraged, and we should explain to people why they’re so important. This shouldn’t come at the expense of taking the fun away from volunteer contributors though. Patches and tests don’t necessarily need to come from the same people.
Do keywords like testwanted and testneeded exist? How about landing certain patches with those keywords added?
Also, you don’t want to get into a situation where the choices are 1) fix regression by fixing an obvious typo, but now I have to write a test and *insert exc^H^H^Hreason I can’t right now*, or 2) back out the change (and with that some other feature or fix) that caused regression in the first place (with the added benefit of putting the test-writing burden on the regressor). When (2) becomes more attractive than (1), guess where people will look.
I know that’s probably exaggerating things just a tad, but I am trying to make a point ;-)
There should be a separation between contributors who have cvs write permission and volunteers who do not. There should be no excuses for the first group.
As much as I’d like to say “we can get to this test later,” I’m pretty much bound by toolkit review policy , which states “All patches must include unit tests which are run during
make check.” I’m trying to be responsible and actually follow this policy, even though I know many submodules in toolkit don’t.
While it’s true that we have a testing community, there are (last I knew) over one hundred tests for the download manager alone. In the past, it’s taken them several weeks to find regressions to things (many of which would have been easily tested with a unit test). On top of that, there are a lot of things that simply have no test coverage in litmus or in the automated test suite. I personally found out about a lot of download manager features because people filed a bug saying that we broke it a few weeks ago.
I don’t exactly agree with you. When I first started hacking, I had a lot of fun just learning all the new techniques to do things. Writing tests is just another good technique to writing better code. It helps to prove that your fix is correct, it makes sure you don’t break what you just fixed in the future, and it makes sure you have a more sensible API (usually) because you have to test it.
Doesn’t look like it, but we do have the in-testsuite flags, which there are currently 28 download manager bugs marked as such.
I get your point, but I think we need to work to make (2) the defacto choice for the good of the codebase.
But who would write the tests for the second group? If you leave that to the module owner, he or she could very easily get swamped just writing tests and doing reviews.
[…] policy doesn’t allow for that as an exception, so someone else needs to write the test. One commenter suggested that someone else write the test(s) for the patch submitter. From a patcher’s […]
Shawn: Sure, those are all reasons for writing tests, and all these reasons should be presented to our contributors. That’s orthogonal to my point though. I was saying that if a volunteer isn’t happy to write tests (for whatever reason) after being given the full explanation, we shouldn’t chase them away by also demanding tests or making them feel bad. We’ve managed to ship good software in the past without good test coverage (not sure how :-) ), but we’ve never managed to ship good software without volunteers.