Confirmation Bias in Unit Testing
Confirmation bias is a tendency to search for or interpret new information in a way that confirms one’s preconceptions and to avoid information and interpretations which contradict prior beliefs. In unit testing, it is a byproduct of test-after-developent and manifests itself as the tendency for engineers to only write tests that they know pass. Take the following over simplified example…
Assume that our imaginary engineer, Mark Badcoder, has been tasked to develop code that identifies if a given year is a leap year (thanks to Kevlin Henney for the unit idea). Now Mark knows a thing or two about leap years. He knows that they were created to make up for a slight difference between the solar year and the Gregorian calendar, and they add one day to the end of February every four years (little known fact, the difference is actually slightly less that .25 days and to compensate, years divisible by 100 are only leap years if they are also divisible by 400). He also knows that 2000 was a leap year, and from there writes the following code:
public static boolean isLeapYeap(int year) {
int offset = 2000 - year;
return (offset % 4 == 0);
}
Happy with his code, Mark then turns to writing his unit tests. He tests the last four known leap years, as well as the next two. He also tests some other years in that same time period to ensure that they are also correctly identified. In doing so, he creates the following tests:
- testThat2008IsALeapYear()
- testThat2004IsALeapYear()
- testThat2000IsALeapYear()
- testThat1996IsALeapYear()
- testThat2012IsALeapYear()
- testThat2016IsALeapYear()
- testThat1997IsNotALeapYear()
- testThat2001IsNotALeapYear()
- testThat2013IsNotALeapYear()
Not surprisingly, all of the tests pass. Satisfied that he has produced working code, Mark commits it to the repository. Unfortunately, there is a problem.
Go back up to the little known fact that was mentioned in the second paragraph. Because the solar year is slightly less than 365.25 days, the every four years rule is broken at the century mark. A year that is divisible by 100 is only a leap year if it is also divisible by 400. That means that while 1600 and 2000 were leap years, 1900 and 2100 are not. With this knowledge, it is easy to see that Mark’s code would fail if we tested it with 1900 or 2100.
The problem is that Mark knew how the code worked before he wrote the tests. His assumption that the code was correct led him to inadvertently only test it in ways that he knew would pass. He was not even considering the 1900 or 2100 exceptions because his code did not take them into account. FYI, one way to avoid this problem is to practice Test Driven Development. In TDD, you don’t know what the code looks like before you write the test, therefore there is less opportunity for confirmation bias to rear its ugly head…
Don't miss any posts! Subscribe to our blog feed or only posts by Paul Bourdeaux.
Short URL: http://sundoginteractive.com/e/3373


Comments
Yet with TDD you still can write exactly the same tests as Mark Badcoder (btw: I love the character - would you mind if I borrow him from time to time?) The problem here, as I see it, is within proper understanding of what a function should be doing.
If you don’t consider “a year that is divisible by 100 is only a leap year if it is also divisible by 400” requirement you’ll come up with bad tests and bad algorithm.
But of course if Mark writes unit tests he has two chances to catch his bug instead of one.
Feel free to use and abuse Mark Badcoder any time you like. He tends to make it into some of my presentations as well. :) Plus there is an inside joke regarding his namesake, so it would make me laugh every time I saw it used elsewhere!
And yes you can still fall victim to the same kinds of mistakes with TDD, however they are less likely. In this example, failure to consider all of the requirements led to the poor code. Then Mark inadvertently wrote tests that made his poor code appear functional. When practicing TDD, we usually rely on the requirements - not the code - to create the tests, so there is less of a chance that the tests we write mask mistakes in the code. In fact, the test themselves would look completely different.
If I was going to use TDD to create the unit tests, my test suite would look something like this (again, credit to Kevlin Henney and his GUTs talk):
* testThatYearsNotDivisibleBy4AreNotLeapYears()
* testThatYearsDivisibleBy4ButNot100AreLeapYears()
* testThatYearsDivisibleBy4And100ButNot400AreNotLeapYears()
* testThatYearsDivisibleBy400AreLeapYears()
Leave A Comment