Specs vs. Tests

There’s something to this BDD kool-aid that people have been drinking lately…

As part of the Rails project I’ve been working on for the last few weeks, I’ve been using RSpec. RSpec is a unit testing tool similar in spirit to JUnit or Test/Unit. However RSpec uses an alternative syntax that reads more like a specification than like a test. Let me show you what I mean.

In Java, using JUnit, we might write the following unit test:

public class BowlingGameTest extends TestCase {

 private Game g;

 protected void setUp() throws Exception {

   g = new Game();

 }

 private void rollMany(int n, int pins) {

   for (int i=0; i< n; i++) {

     g.roll(pins);

   }

 }

 public void testGutterGame() throws Exception {

   rollMany(20, 0);

   assertEquals(0, g.score());

   assertTrue(g.isComplete());

 }

 public void testAllOnes() throws Exception {

   rollMany(20,1);

   assertEquals(20, g.score());

   assertTrue(g.isComplete());

 }

}

This is pretty typical for a Java unit test. The setup function builds the Game object, and then the various test functions make sure that it works in each different scenario. In Ruby however, this might be expressed using RSpec as:

require 'rubygems'

require_gem "rspec"

require 'game'

context "When a gutter game is rolled" do

 setup do

   @g = Game.new

   20.times {@g.roll 0}

 end

 specify "score should be zero" do

   @g.score.should == 0

 end

 specify "game should be complete" do

   @g.complete?.should_be true

 end

end

context "When all ones are rolled" do

 setup do

   @g = Game.new

   20.times{@g.roll 1}

 end

 specify "score should be 20" do

   @g.score.should == 20

 end

 specify "game should be complete" do

   @g.complete?.should_be true

 end

end

At first blush the difference seems small. Indeed, the RSpec code might seem too verbose and fine-grained. At least that was my first impression when I first saw RSpec. However, having used it now for several months I have a different reaction.

First, let’s looks a the semantic differences. In JUnit you have TestCase derivatives, and test functions. Each TestCase derivative has a setUp and tearDown function, and a suite of test functions. In RSpec you have what appears to be an extra layer. You have the test script, which is composed of context blocks. The contexts have setup, teardown, and specify blocks.

At first you might think that the RSpec context block coresponds to the Java TestCase derivative since they are semantically equivalent. However Java throws something of a curve at us by only allowing one public class per file. So from an organizational point of view there is a stronger equivalence between the TestCase derivative and the whole RSpec test script.

This might seem petty. After all, I can write Java code that is semantically equivalent to the RSpec code simply by creating two TestCase derivatives in two different files. But separating those two test cases into two different files makes a big difference to me. It breaks apart things that otherwise want to stay together.

Now it’s true that I could keep the TestCase derivatives in the same file by making them package scope, and manually put them into a public TestSuite class. But who wants to do that? After all, my IDE is nice enough to find and execute all the public TestCase derivatives, which completely eliminates the need for me to build suites—at least at first.

Note: The JDave tool provides BDD syntax for Java.

Again, this might seem petty; and if that were the only benefit to the RSpec syntax I would agree. But it’s not the only benefit.

Strange though it may seem, the next benefit is the strings that describe the context and specify blocks. At first I thought these strings were just noise, like the strings in the JUnit assert functions. I seldom, if ever, use the JUnit assert strings, so why would I use the context and specify strings? But over the last few weeks I have come to find that, unlike the JUnit assert strings, the RSpec strings put a subtle force on me to create better test designs.

Stable State: An Emergent Rule.

When a spec fails, the message that gets printed is the concatenation of the context string and the specify string. For example: 'When a gutter game is rolled game should be complete' FAILED. If you word the context and specify strings properly, these error message make nice sentences. Since, in TDD, we almost always start out with our tests failing, I see these error message a lot. So there is a pressure on me to word them well.

But by wording them well, I am constrained to obey a rule that JUnit never put pressure on me to obey. Indeed, I didn’t know it was a rule until I started using RSpec. I call this rule Stable State, it is:

Tests don’t change the state.

In other words, the functions that make assertions about the state of the system, do not also change the state of the system. The state of the system is set up once in the setUp function, and then only interrogated by the test functions.

If you look carefully at the specification of the Bowling Game you will see that the state of the Game is changed only by the setup block within the context blocks. The specify blocks simply interrogate and verify state. This is in stark contrast to the JUnit tests in which the test methods both change and verify the state of the Game.

If you don’t follow this rule it is hard to get the strings on the context and specify blocks to create error messages that read well. On the other hand, if you make sure that the specify blocks don’t change the state, then you can find simple sentences that describe each context and specify block. And so the subtle pressure of the strings has a significant impact on the structure of the tests.

I can’t claim to have discovered the pressure of these strings. Indeed, Dan North’s original article on the topic is captivating. However, I felt the pressure and came to the same conclusion he did, well before I read his article; simply by using a tool inspired by his work.

The benefit of Stable State is that for each set of assertions there is one, and only one place where the state of the system is changed. Moreover the three level structure provides natural places for groups of state, states, and asserts.

The demise of the One Assert rule.

There have been other rules like this before. One that circulated a few years back was:

One assert per test.

I never bought into this rule, and I still don’t. It seems arbitrary and inefficient. Why should I put each assert statement into it’s own test method when I can just as well put the assert statement into a single test method.

In other words, why prefer this:

 public void testGutterGameScoreIsZero() throws Exception {

   rollMany(20, 0);

   assertEquals(0, g.score());

 }

 public void testGutterGameIsComplete() throws Exception {

   rollMany(20, 0);

   assertTrue(g.isComplete());

 }

over this:

 public void testGutterGame() throws Exception {

   rollMany(20, 0);

   assertEquals(0, g.score());

   assertTrue(g.isComplete());

 }

I think the authors of the One Assert rule were trying to achieve the benefits of Stable State, but missed the mark. It’s as though they could smell the rule out there, but couldn’t quite pinpoint it.

The State Machine metaphor

When you follow the Stable State rule your specifications (tests) become a description of a Finite State Machine. Each context block describes how to drive the SUT to a given state, and then the specify blocks describe the attributes of that state.

Dan North calls this the Given-When-Then metaphor. Consider the following triplet:

Given a Bowling Game: When 20 gutter balls are rolled, Then the score should be zero and the game should be complete.

This triplet corresponds nicely to a row in a state transition table. Consider, for example, the subway turnstile state machine:

We can read this as follows:

GIVEN we are in the Locked state, WHEN we get a coin event, THEN we should be in the Unlocked state. GIVEN we are in the Unlocked state, WHEN we get a pass event, THEN we should be in the Locked state. etc.

Describing a system as a finite state machine has certain benefits.

The point is that organizing the system description in terms of a finite state machine can have a profound impact on the system design and implementation.

The Butterfly Effect.

I find it remarkable that two dumb annoying little strings put a subtle pressure on me to adjust the style of my tests. That change in style eventually caused me to see the design and implementation of the system I was writing in a very new and interesting light.