Category Archives: Development

Integration and Simulation Tests in Python

One of my (many) tasks lately has been to rework unit and integration tests for Review Bot, our automated code review add-on for Review Board.

The challenge was providing a test suite that could test against real-world tools, but not require them. An ever-increasing list of compatible tools has threatened to become an ever-increasing burden on contributors. We wanted to solve that.

So here’s how we’re doing it.

First off, unit test tooling

First off, this is all Python code, which you can find on the Review Bot repository on GitHub.

We make heavy use of kgb, a package we’ve written to add function spies to Python unit tests. This goes far beyond Mock, allowing nearly any function to be spied on without having to be replaced. This module is a key component to our solution, given our codebase and our needs, but it’s an implementation detail — it isn’t a requirement for the overall approach.

Still, if you’re writing complex Python test suites, check out kgb.

Deciding on the test strategy

Review Bot can talk to many command line tools, which are used to perform checks and audits on code. Some are harder than others to install, or at least annoying to install.

We decided there’s two types of tests we need:

  1. Integration tests — ran against real command line tools
  2. Simulation tests — ran against simulated output/results that would normally come from a command line tool

Being that our goal is to ease contribution, we have to keep in mind that we can’t err too far on that side at the expense of a reliable test suite.

We decided to make these the same tests.

The strategy, therefore, would be this:

  1. Each test would contain common logic for integration and simulation tests. A test would set up state, perform the tool run, and then check results.
  2. Integration tests would build upon this by checking dependencies and applying configuration before the test run.
  3. Simulation tests would be passed fake output or setup data needed to simulate that tool.

This would be done without any code duplication between integration or simulation tests. There would be only one test function per expectation (e.g., a successful result or the handling of an error). We don’t want to worry about tests getting out of sync.

Regression in our code? Both types of tests should catch it.

Regression or change in behavior in an integrated tool? Any fixes we apply would update or build upon the simulation.

Regression in the simulation? Something went wrong, and we caught it early without having to run the integration test.

Making this all happen

We introduced three core testing components:

  1. @integration_test() — a decorator that defines and provides dependencies and input for an integration test
  2. @simulation_test() — a decorator that defines and provides output and results for a simulation test
  3. ToolTestCaseMetaClass — a metaclass that ties it all together

Any test class that needs to run integration and simulation tests will use ToolTestCaseMetaClass and then apply either or both @integration_test/@simulation_test decorators to the necessary test functions.

When a decorator is applied, the test function is opted into that type of test. Data can be passed into the decorator, which is then passed into the parent test class’s setup_integration_test() or setup_simulation_test().

These can do whatever they need to set up that particular type of test. For example:

  • Integration test setup defaults to checking dependencies, skipping a test if not met.
  • Simulation test setup may write some files or spy on a subprocess.Popen() call to fake output.


For example:

class MyTests(kgb.SpyAgency, TestCase,
              metaclass=ToolTestCaseMetaClass):
    def setup_simulation_test(self, output):
        self.spy_on(execute, op=kgb.SpyOpReturn(output))

    def setup_integration_test(self, exe_deps):
        if not are_deps_found(exe_deps):
            raise SkipTest('Missing one or more dependencies')

    @integration_test(exe_deps=['mytool'])
    @simulation_test(output=(
        b'MyTool 1.2.3\n'
        b'Scanning code...\n'
        b'0 errors, 0 warnings, 1 file(s) checked\n'
    ))
    def test_execute(self):
        """Testing MyTool.execute"""
        ...

When applied, ToolTestCaseMetaClass will loop through each of the test_*() functions with these decorators applied and split them up:

  • Test functions with @integration_test will be split out into a test_integration_<name>() function, with a [integration test] suffix appended to the docstring.
  • Test functions with @simulation_test will be split out into test_simulation_<name>(), with a [simulation test] suffix appended.

The above code ends up being equivalent to:

class MyTests(kgb.SpyAgency, TestCase):
    def setup_simulation_test(self, output):
        self.spy_on(execute, op=kgb.SpyOpReturn(output))

    def setup_integration_test(self, exe_deps):
        if not are_deps_found(exe_deps):
            raise SkipTest('Missing one or more dependencies')

    def test_integration_execute(self):
        """Testing MyTool.execute [integration test]"""
        self.setup_integration_test(exe_deps=['mytool'])
        self._test_common_execute()

    def test_simulation_execute(self):
        """Testing MyTool.execute [simulation test]"""
        self.setup_simulation_test(output=(
            b'MyTool 1.2.3\n'
            b'Scanning code...\n'
            b'0 errors, 0 warnings, 1 file(s) checked\n'
        ))
        self._test_common_execute()

    def _test_common_execute(self):
        ...

Pretty similar, but less to maintain in the end, especially as tests pile up.

And when we run it, we get something like:

Testing MyTool.execute [integration test] ... ok
Testing MyTool.execute [simulation test] ... ok

...

Or, you know, with a horrible, messy error.

Iterating on tests

It’s become really easy to maintain and run these tests.

We can now start by writing the integration test, modify the code to log any data that might be produced by the command line tool, and then fake-fail the test to see that output.

class MyTests(kgb.SpyAgency, TestCase,
              metaclass=ToolTestCaseMetaClass):
    ...

    @integration_test(exe_deps=['mytool'])
    def test_process_results(self):
        """Testing MyTool.process_results"""
        self.setup_files({
            'filename': 'test.c',
            'content': b'int main() {return "test";}\n',
        })

        tool = MyTool()
        payload = tool.run(files=['test.c'])

        # XXX
        print(repr(payload))

        results = MyTool().process_results(payload)

        self.assertEqual(results, {
            ...
        })

        # XXX Fake-fail the test
        assert False

I can run that and get the results I’ve printed:

======================================================================
ERROR: Testing MyTool.process_results [integration test]
----------------------------------------------------------------------
Traceback (most recent call last):
    ...
-------------------- >> begin captured stdout << ---------------------
{"errors": [{"code": 123, "column": 13, "filename": "test.c", "line': 1, "message": "Expected return type: int"}]}

Now that I have that, and I know it’s all working right, I can feed that output into the simulation test and clean things up:

class MyTests(kgb.SpyAgency, TestCase,
              metaclass=ToolTestCaseMetaClass):
    ...

    @integration_test(exe_deps=['mytool'])
    @simulation_test(output=json.dumps(
        'errors': [
            {
                'filename': 'test.c',
                'code': 123,
                'line': 1,
                'column': 13,
                'message': 'Expected return type: int',
            },
        ]
    ).encode('utf-8'))
    def test_process_results(self):
        """Testing MyTool.process_results"""
        self.setup_files({
            'filename': 'test.c',
            'content': b'int main() {return "test";}\n',
        })

        tool = MyTool()
        payload = tool.run(files=['test.c'])
        results = MyTool().process_results(payload)

        self.assertEqual(results, {
            ...
        })

Once it’s running correctly in both tests, our job is done.

From then on, anyone working on this code can just simply run the test suite and make sure their change hasn’t broken any simulation tests. If it has, and it wasn’t intentional, they’ll have a great starting point in diagnosing their issue, without having to install anything.

Anything that passes simulation tests can be considered a valid contribution. We can then test against the real tools ourselves before landing a change.

Development is made simpler, and there’s no worry about regressions.

Going forward

We’re planning to apply this same approach to both Review Board and RBTools. Both currently require contributors to install a handful of command line tools or optional Python modules to make sure they haven’t broken anything, and that’s a bottleneck.

In the future, we’re looking at making use of python-nose‘s attrib plugin, tagging integration and simulation tests and making it trivially easy to run just the suites you want.

We’re also considering pulling out the metaclass and decorators into a small, reusable Python packaging, making it easy for others to make use of this pattern.

Weird bugs: Django, timezones, and importing from eggs

Every so often you hit a bug that makes you question your sanity. The past several days have been spent chasing one of the more confusing ones I’ve seen in a long time.

Review Board 1.7 added the ability to set the server-wide timezone. During development, we found problems using SSH with a non-default timezone. This only happened when updating os.environ[‘TZ’] to something other than our default of UTC. We’d see the SSH process (rbssh, our wrapper for SSH communication) break due to an EOF on stdin and stdout, and then we’d see the development server reload itself.

Odd.

Since this originated with a Subversion repository, I first suspected libsvn. I spent some time going through their code to see if a timezone update would break something. Perhaps timeout logic. That didn’t turn up anything interesting, but I couldn’t rule it out.

Other candidates for suspicion were rbssh itself, paramiko (the SSH library), Django, and the trickster god Loki. We just had too many moving pieces to know for sure.

So I wrote a little script to get in-between a calling process and another process and log all communication between them. I tested this with rbssh and with plain ol’ ssh. rbssh was the only one that broke. Strange, since it wasn’t doing anything obviously wrong, and it worked with the default timezone. Unless it was Paramiko somehow…

For the heck of it, I tried copying some of rbssh’s imports into this new script. Ah-ha! It dropped its streams when importing Paramiko, same as rbssh. Interesting. Time to dig into that code.

The base paramiko module imports a couple dozen other modules, so I started by narrowing it down and reducing imports until I found the common one that breaks things. Well that turned out to be a module that imported Crypto.Random. Replacing the paramiko import in my wrapper with Crypto.Random verified that that was the culprit.

Getting closer…

I rinsed and repeated with Crypto.Random, digging through the code and seeing what could have broken. Hmm, that code’s pretty straight-forward, but there are some native libraries in there. Well, all this is in a .egg file (not an extracted .egg directory), making it hard to look through, so I extracted it and replaced it with a .egg directory.

Woah! The problem went away!

I glance at the clock. 3AM. I’m not sure I can trust what I’m seeing anymore. Crypto.Random breaks rbssh, but only when installed as a .egg file and not a .egg directory. That made no sense, but I figured I’d deal with it in the morning.

My dreams that night were filled with people wearing “stdin” and “stdout” labels on their foreheads, not at all getting along.

Today, I considered just ripping out timezone support. I didn’t know what else to do. Though, since I’m apparently a bit of a masochist, I decided to look into this just a little bit more. And finally struck gold.

With my Django development server running, I opened up a separate, plain Python shell. In it, I typed “import Crypto.Random”. And suddenly saw my development server reload.

How could that happen, I wondered. I tried it again. Same result. And then… lightbulb!

Django reloads the dev server when modules change. Crypto is a self-contained .egg file with native files that must be extracted and added to the module path. Causing Django to reload. Causing it to drop the spawned rbssh process. Causing the streams to disconnect. Ah-ha. This had to be it.

One last piece of the puzzle. The timezone change.

I quickly located their autoreload code and pulled it up. Yep, it’s comparing modified timestamps. We have two processes with two different ideas of what the current timezone is (one UTC, one US/Pacific, in my case), meaning when rbssh launched and imported Crypto, we’d get a bunch of files extracted with US/Pacific-based timestamps and not UTC, triggering the autoreload.

Now that the world makes sense again, I can finally fix the problem!

All told, that was about 4 or 5 days of debugging. Certainly not the longest debugging session I’ve had, but easily one of the more confusing ones in a while. Yet in the end, it’s almost obvious.