Generated Tests and TDD

Posted by Uncle Bob on 01/10/2008

TDD has become quite popular, and many companies are attempting to adopt it. However, some folks worry that it takes a long time to write all those unit tests and are looking to test-generation tools as a way to decrease that burden.

The burden is not insignificant. FitNesse, an application created using TDD, is comprised of 45,000 lines of Java code, 15,000 of which are unit tests. Simple math suggests that TDD increases the coding burden by a full third!

Of course this is a naive analysis. The benefits of using TDD are significant, and far outweigh the burden of writing the extra code. But that 33% still feels “extra” and tempts people to find ways to shrink it without losing any of the benefits.

Test Generators.

Some folks have put their hope in tools that automatically generate tests by inspecting code. These tools are very clever. They generate random calls to methods and remember the results. They can automatically build mocks and stubs to break the dependencies between modules. They use remarkably clever algorithms to choose their random test data. They even provide ways for programmers to write plugins that adjust those algorithms to be a better fit for their applications.

The end result of running such a tool is a set of observations. The tool observes how the instance variables of a class change when calls are made to its methods with certain arguments. It notes the return values, changes to instance variables, and outgoing calls to stubs and mocks. And it presents these observations to the user.

The user must look through these observations and determine which are correct, which are irrelevant, and which are bugs. Once the bugs are fixed, these observations can be checked over and over again by re-running the tests. This is very similar to the record-playback model used by GUI testers. Once you have registered all the correct observations, you can play the tests back and make sure those observations are still being observed.

Some of the tools will even write the observations as JUnit tests, so that you can run them as a standard test suite. Just like TDD, right? Well, not so fast…

Make no mistake, tools like this can be very useful. If you have a wad of untested legacy code, then generating a suite of JUnit tests that verifies some portion of the behavior of that code can be a great boon!

The Periphery Problem

On the other hand, no matter how clever the test generator is, the tests it generates will always be more naive than the tests that a human can write. As a simple example of this, I have tried to generate tests for the bowling game program using two of the better known test generation tools. The interface to the Bowling Game looks like this:

 public class BowlingGame {

   public void roll(int pins) {...}

   public int score() {...}

 }

The idea is that you call roll each time the balls gets rolled, and you call score at the end of the game to get the score for that game.

The test generators could not randomly generate valid games. It’s not hard to see why. A valid game is a sequence of between 12 and 21 rolls, all of which must be integers between 0 and 10. What’s more, within a given frame, the sum of rolls cannot exceed 10. These constraints are just too tight for a random generator to achieve within the current age of the universe.

I could have written a plugin that guided the generator to create valid games; but such an algorithm would embody much of the logic of the BowlingGame itself, so it’s not clear that the economics are advantageous.

To generalize this, the test generators have trouble getting inside algorithms that have any kind of protocol, calling sequence, or state semantics. They can generate tests around the periphery of the classes; but can’t get into the guts without help.

TDD?

The real question is whether or not such generated tests help you with Test Driven Development. TDD is the act of using tests as a way to drive the development of the system. You write unit test code first, and then you write the application code that makes that code pass. Clearly generating tests from existing code violates that simple rule. So in some philosophical sense, using test generators is not TDD. But who cares so long as the tests get written, right? Well, hang on…

One of the reasons that TDD works so well is that it is similar to the accounting practice of dual entry bookkeeping. Accountants make every entry twice; once on the credit side, and once on the debit side. These two entries follow separate mathematical pathways. In the end a magical subtraction yields a zero if all the entries were made correctly.

In TDD, programmers state their intent twice; once in the test code, and again in the production code. These two statements of intent verify each other. The tests, test the intent of the code, and the code tests the intent of the tests. This works because it is a human that makes both entries! The human must state the intent twice, but in two complementary forms. This vastly reduces many kinds of errors; as well as providing significant insight into improved design.

Using a test generator breaks this concept because the generator writes the test using the production code as input. The generated test is not a human restatement, it is an automatic translation. The human states intent only once, and therefore does not gain insights from restatement, nor does the generated test check that the intent of the code was achieved. It is true that the human must verify the observations, but compared to TDD that is a far more passive action, providing far less insight into defects, design and intent.

I conclude from this that automated test generation is neither equivalent to TDD, nor is it a way to make TDD more efficient. What you gain by trying to generate the 33% test code, you lose in defect elimination, restatement of intent, and design insight. You also sacrifice depth of test coverage, because of the periphery problem.

This does not mean that test generators aren’t useful. As I said earlier, I think they can help to partially characterize a large base of legacy code. But these tools are not TDD tools. The tests they generate are not equivalent to tests written using TDD. And many of the benefits of TDD are not achieved through test generation.

Posted by Uncle Bob on 12/13/2007

I was at a client recently. They are a successful startup who have gone through a huge growth spurt. Their software grew rapidly, through a significant hack-and-slash program. Now they have a mess, and it is slowing them way down. Defects are high. Unintended consequences of change are high. Productivity is low.

I spent two days advising them how to adopt TDD and Clean Code techniques to improve their code-base and their situation. We discussed strategies for gradual clean up, and the notion that big refactoring projects and big redesign projects have a high risk of failure. We talked about ways to clean things up over time, while incrementally insinuating tests into the existing code base.

During the sessions they told me of a software manager who is famed for having said:

“There’s a clean way to do this, and a quick-and-dirty way to do this. I want you to do it the quick-and-dirty way.”

The attitude engendered by this statement has spread throughout the company and has become a significant part of their culture. If hack-and-slash is what management wants, then that’s what they get! I spent a long time with these folks countering that attitude and trying to engender an attitude of craftsmanship and professionalism.

The developers responded to my message with enthusiasm. They want to do a good job (of course!) They just didn’t know they were authorized to do good work. They thought they had to make messes. But I told them that the only way to get things done quickly, and keep getting things done quickly, is to create the cleanest code they can, to work as well as possible, and keep the quality very high. I told them that quick-and-dirty is an oxymoron. Dirty always means slow.

On the last day of my visit the infamous manager (now the CTO) stopped into our conference room. We talked over the issues. He was constantly trying to find a quick way out. He was manipulative and cajoling. “What if we did this?” or “What if we did that?” He’d set up straw man after straw man, trying to convince his folks that there was a time and place for good code, but this was not it.

I wanted to hit him.

Then he made the dumbest, most profoundly irresponsible statement I’ve (all too often) heard come out of a CTOs mouth. He said:

“Business software is messy and ugly.”

No, it’s not! The rules can be complicated, arbitrary, and ad-hoc; but the code does not need to be messy and ugly. Indeed, the more arbitrary, complex, and ad-hoc the business rules are, the cleaner the code needs to be. You cannot manage the mess of the rules if they are contained by another mess! The only way to get a handle on the messy rules is to express them in the cleanest and clearest code you can.

In the end, he backed down. At least while I was there. But I have no doubt he’ll continue his manipulations. I hope the developers have the will to resist.

One of the developers asked the question point blank:

Comments

Leave a response