Review-Board

Integration and Simulation Tests in Python

One of my (many) tasks lately has been to rework unit and integration tests for Review Bot, our automated code review add-on for Review Board.

The challenge was providing a test suite that could test against real-world tools, but not require them. An ever-increasing list of compatible tools has threatened to become an ever-increasing burden on contributors. We wanted to solve that.

So here’s how we’re doing it.

First off, unit test tooling

First off, this is all Python code, which you can find on the Review Bot repository on GitHub.

We make heavy use of kgb, a package we’ve written to add function spies to Python unit tests. This goes far beyond Mock, allowing nearly any function to be spied on without having to be replaced. This module is a key component to our solution, given our codebase and our needs, but it’s an implementation detail — it isn’t a requirement for the overall approach.

Still, if you’re writing complex Python test suites, check out kgb.

Deciding on the test strategy

Review Bot can talk to many command line tools, which are used to perform checks and audits on code. Some are harder than others to install, or at least annoying to install.

We decided there’s two types of tests we need:

  1. Integration tests — ran against real command line tools
  2. Simulation tests — ran against simulated output/results that would normally come from a command line tool

Being that our goal is to ease contribution, we have to keep in mind that we can’t err too far on that side at the expense of a reliable test suite.

We decided to make these the same tests.

The strategy, therefore, would be this:

  1. Each test would contain common logic for integration and simulation tests. A test would set up state, perform the tool run, and then check results.
  2. Integration tests would build upon this by checking dependencies and applying configuration before the test run.
  3. Simulation tests would be passed fake output or setup data needed to simulate that tool.

This would be done without any code duplication between integration or simulation tests. There would be only one test function per expectation (e.g., a successful result or the handling of an error). We don’t want to worry about tests getting out of sync.

Regression in our code? Both types of tests should catch it.

Regression or change in behavior in an integrated tool? Any fixes we apply would update or build upon the simulation.

Regression in the simulation? Something went wrong, and we caught it early without having to run the integration test.

Making this all happen

We introduced three core testing components:

  1. @integration_test() — a decorator that defines and provides dependencies and input for an integration test
  2. @simulation_test() — a decorator that defines and provides output and results for a simulation test
  3. ToolTestCaseMetaClass — a metaclass that ties it all together

Any test class that needs to run integration and simulation tests will use ToolTestCaseMetaClass and then apply either or both @integration_test/@simulation_test decorators to the necessary test functions.

When a decorator is applied, the test function is opted into that type of test. Data can be passed into the decorator, which is then passed into the parent test class’s setup_integration_test() or setup_simulation_test().

These can do whatever they need to set up that particular type of test. For example:

  • Integration test setup defaults to checking dependencies, skipping a test if not met.
  • Simulation test setup may write some files or spy on a subprocess.Popen() call to fake output.


For example:

class MyTests(kgb.SpyAgency, TestCase,
              metaclass=ToolTestCaseMetaClass):
    def setup_simulation_test(self, output):
        self.spy_on(execute, op=kgb.SpyOpReturn(output))

    def setup_integration_test(self, exe_deps):
        if not are_deps_found(exe_deps):
            raise SkipTest('Missing one or more dependencies')

    @integration_test(exe_deps=['mytool'])
    @simulation_test(output=(
        b'MyTool 1.2.3\n'
        b'Scanning code...\n'
        b'0 errors, 0 warnings, 1 file(s) checked\n'
    ))
    def test_execute(self):
        """Testing MyTool.execute"""
        ...

When applied, ToolTestCaseMetaClass will loop through each of the test_*() functions with these decorators applied and split them up:

  • Test functions with @integration_test will be split out into a test_integration_<name>() function, with a [integration test] suffix appended to the docstring.
  • Test functions with @simulation_test will be split out into test_simulation_<name>(), with a [simulation test] suffix appended.

The above code ends up being equivalent to:

class MyTests(kgb.SpyAgency, TestCase):
    def setup_simulation_test(self, output):
        self.spy_on(execute, op=kgb.SpyOpReturn(output))

    def setup_integration_test(self, exe_deps):
        if not are_deps_found(exe_deps):
            raise SkipTest('Missing one or more dependencies')

    def test_integration_execute(self):
        """Testing MyTool.execute [integration test]"""
        self.setup_integration_test(exe_deps=['mytool'])
        self._test_common_execute()

    def test_simulation_execute(self):
        """Testing MyTool.execute [simulation test]"""
        self.setup_simulation_test(output=(
            b'MyTool 1.2.3\n'
            b'Scanning code...\n'
            b'0 errors, 0 warnings, 1 file(s) checked\n'
        ))
        self._test_common_execute()

    def _test_common_execute(self):
        ...

Pretty similar, but less to maintain in the end, especially as tests pile up.

And when we run it, we get something like:

Testing MyTool.execute [integration test] ... ok
Testing MyTool.execute [simulation test] ... ok

...

Or, you know, with a horrible, messy error.

Iterating on tests

It’s become really easy to maintain and run these tests.

We can now start by writing the integration test, modify the code to log any data that might be produced by the command line tool, and then fake-fail the test to see that output.

class MyTests(kgb.SpyAgency, TestCase,
              metaclass=ToolTestCaseMetaClass):
    ...

    @integration_test(exe_deps=['mytool'])
    def test_process_results(self):
        """Testing MyTool.process_results"""
        self.setup_files({
            'filename': 'test.c',
            'content': b'int main() {return "test";}\n',
        })

        tool = MyTool()
        payload = tool.run(files=['test.c'])

        # XXX
        print(repr(payload))

        results = MyTool().process_results(payload)

        self.assertEqual(results, {
            ...
        })

        # XXX Fake-fail the test
        assert False

I can run that and get the results I’ve printed:

======================================================================
ERROR: Testing MyTool.process_results [integration test]
----------------------------------------------------------------------
Traceback (most recent call last):
    ...
-------------------- >> begin captured stdout << ---------------------
{"errors": [{"code": 123, "column": 13, "filename": "test.c", "line': 1, "message": "Expected return type: int"}]}

Now that I have that, and I know it’s all working right, I can feed that output into the simulation test and clean things up:

class MyTests(kgb.SpyAgency, TestCase,
              metaclass=ToolTestCaseMetaClass):
    ...

    @integration_test(exe_deps=['mytool'])
    @simulation_test(output=json.dumps(
        'errors': [
            {
                'filename': 'test.c',
                'code': 123,
                'line': 1,
                'column': 13,
                'message': 'Expected return type: int',
            },
        ]
    ).encode('utf-8'))
    def test_process_results(self):
        """Testing MyTool.process_results"""
        self.setup_files({
            'filename': 'test.c',
            'content': b'int main() {return "test";}\n',
        })

        tool = MyTool()
        payload = tool.run(files=['test.c'])
        results = MyTool().process_results(payload)

        self.assertEqual(results, {
            ...
        })

Once it’s running correctly in both tests, our job is done.

From then on, anyone working on this code can just simply run the test suite and make sure their change hasn’t broken any simulation tests. If it has, and it wasn’t intentional, they’ll have a great starting point in diagnosing their issue, without having to install anything.

Anything that passes simulation tests can be considered a valid contribution. We can then test against the real tools ourselves before landing a change.

Development is made simpler, and there’s no worry about regressions.

Going forward

We’re planning to apply this same approach to both Review Board and RBTools. Both currently require contributors to install a handful of command line tools or optional Python modules to make sure they haven’t broken anything, and that’s a bottleneck.

In the future, we’re looking at making use of python-nose‘s attrib plugin, tagging integration and simulation tests and making it trivially easy to run just the suites you want.

We’re also considering pulling out the metaclass and decorators into a small, reusable Python packaging, making it easy for others to make use of this pattern.

Using Freshdesk with PagerDuty for Better Customer Support

At Beanbag, we’ve been using Freshdesk to handle customer support for Review Board, Power Pack, and RBCommons.

We’ve also been using PagerDuty to inform us on any critical events, such as servers going down, memory/CPU load, or security updates we need to apply to our servers.

Our customers’ problems are just as important to us as our servers’ problems, but we were lacking a great way to really get our attention when our customers needed it most. After we started using PagerDuty, the solution became obvious: Integrate PagerDuty with Freshdesk! But how? Neither side had any native integration with the other.

Enter Freshdesk webhooks

Freshdesk’s webhooks support is pretty awesome. Not only can you set them up for any custom condition you like, but payload they send is completely customizable, allowing you to easily construct an API request to another service – like PagerDuty!

This is super useful. It means you won’t need any sort of proxy service or custom script to be set up on your server. All you need are your Freshdesk and PagerDuty accounts.

Deciding on your setup

There are probably many ways you can configure these two to talk, and we played with a couple configurations. Here are the general rules we settled on:

  • Only integrate PagerDuty for paying customers (with whom we have support contracts). We don’t want alerts from random people e-mailing us.
  • When a paying customer open new tickets or reply to existing tickets, assign them to a “Premium Support” group, and create an alert in PagerDuty.
  • When an agent replies or marks a ticket in “Premium Support” as resolved, resolve the alert in PagerDuty.

We always resolve the alert instead of acknowledging it, in order to prevent PagerDuty from auto-unacknowledging a period of time after the agent replies. When the customer replies, it will just reuse the same alert ID, instead of creating new alerts.

Also note that we’re setting things up to alert all of our support staff (all two of us founders) on any important tickets. You may wish to adjust these rules to do something a bit more fine-grained.

Okay, let’s get this set up.

Setting up PagerDuty

We’re going to create a custom Service in PagerDuty. First, log into PagerDuty and click “Escalation Policies” at the top. Then click the New Escalation Policy button.

Name this policy something like “Premium Support Tickets,” and assign your agents.

Next, click “Services” at the top. Then click the Add New Service button and set these fields:

title

Click Add Service. Make a note of the Service API Key. You’ll need this for later.

Edit your service and Uncheck the Incident Ack Timeout and Incident Auto-Resolution checkboxes. Click Save.

Optionally, configure some webhooks to point to other services you want to notify. For instance, we added Slack, so that we’ll instantly see any support requests in-chat.

Okay, you’re done here. Let’s move on to Freshdesk.

Setting up Freshdesk

Freshdesk is going to require four rules: One Dispatch’r and three Observers.

I’ll provide screenshots on this as a reference, along with some code and URLs you can copy/paste.

Start by logging in and going to the Administration page.

Dispatch’r Rule: Alert PagerDuty for important customer tickets

Click “Dispatch’r” and add a new rule.

Dispatchr rule

Set the Rule name and Description to whatever you like. We added a little reminder in the description saying that this must be updated as we add customers.

For the conditions, we’re matching based on company names we’ve created in Freshdesk for our customers. You may instead want to base this on Product, From E-mail, Contact Name, or whatever you like.

For the Callback URL, use https://events.pagerduty.com/generic/2010-04-15/create_event.json. Keep note of this, because you’ll use it for all the payloads you’ll set.

Now set the Content to be the following.

{
    "service_key": "YOUR SECRET KEY GOES HERE",
    "event_type": "trigger",
    "description": "Ticket ID {{ticket.id}} from {{ticket.requester.company_name}}: {{ticket.subject}}",
    "incident_key": "freshdesk_ticket_{{ticket.id}}",
    "client": "Freshdesk",
    "client_url": "{{ticket.url}}",
    "details": {
        "ticket ID": "{{ticket.id}}",
        "status": "{{ticket.status}}",
        "priority": "{{ticket.priority}}",
        "type": "{{ticket.ticket_type}}",
        "due by": "{{ticket.due_by_time}}",
        "requester": "{{ticket.requester.name}}",
        "requester e-mail": "{{ticket.from_email}}"
    }
}

Make note of the whole YOUR SERVICE KEY GOES HERE part in line 2. Remember the service key in PagerDuty? You’ll set that here. You’ll also need to do this for all the webhook payloads I show you from here on out.

Go ahead and add any other actions you may want (such as adding watchers, or setting the priority), and click Save.

Click Reorder and place that rule at the top.

Setting up Freshdesk Observer

Now we need to set up a few observers. Go back to the Administration page and click “Observer.” We’ll be adding three new rules.

Observer Rule #1: Resolve PagerDuty alerts on close

Add an event. You’ll set:

Resolve on Close rule

Use the same Callback URL as earlier, and set the Content to:

{
    "service_key": "YOUR SERVICE KEY GOES HERE",
    "event_type": "resolve",
    "description": "{{ticket.agent.name}} resolved ticket {{ticket.id}}: {{ticket.subject}}",
    "incident_key": "freshdesk_ticket_{{ticket.id}}",
    "details": {
        "ticket ID": "{{ticket.id}}",
        "status": "{{ticket.status}}",
        "priority": "{{ticket.priority}}",
        "type": "{{ticket.ticket_type}}",
        "due by": "{{ticket.due_by_time}}",
        "requester": "{{ticket.requester.name}}",
        "requester e-mail": "{{ticket.from_email}}"
    }
}

Don’t forget that service key!

Observer Rule #2: Resolve PagerDuty alerts on agent reply

Let’s add a new event. This one will resovle your PagerDuty alert when an agent replies to it.

Resolve on Reply rule

Again, same Callback URL, with this Content (and your service key):

{
    "service_key": "YOUR SERVICE KEY GOES HERE",
    "event_type": "resolve",
    "description": "{{ticket.agent.name}} acknowledged ticket {{ticket.id}}: {{ticket.subject}}",
    "incident_key": "freshdesk_ticket_{{ticket.id}}",
    "details": {
        "ticket ID": "{{ticket.id}}",
        "status": "{{ticket.status}}",
        "priority": "{{ticket.priority}}",
        "type": "{{ticket.ticket_type}}",
        "due by": "{{ticket.due_by_time}}",
        "requester": "{{ticket.requester.name}}",
        "requester e-mail": "{{ticket.from_email}}"
    }
}

Observer Rule #3: Alert PagerDuty on customer reply

title

Here’s your Content:

{
    "service_key": "YOUR SERVICE KEY GOES HERE",
    "event_type": "trigger",
    "description": "{{ticket.agent.name}} acknowledged ticket {{ticket.id}}: {{ticket.subject}}",
    "incident_key": "freshdesk_ticket_{{ticket.id}}",
    "details": {
        "ticket ID": "{{ticket.id}}",
        "status": "{{ticket.status}}",
        "priority": "{{ticket.priority}}",
        "type": "{{ticket.ticket_type}}",
        "due by": "{{ticket.due_by_time}}",
        "requester": "{{ticket.requester.name}}",
        "requester e-mail": "{{ticket.from_email}}"
    }
}

Done!

You should now be set. Any incoming tickets that match the conditions you set in the Dispatch’r rule will be tracked by PagerDuty.

Now you have no excuse for missing those important support tickets! And your customers will thank you for it.

A new adventure begins

Act 1, Scene 1

August 23rd, 2004. A young kid, not even 21, freshly dropped out of college, passionate about open source and programming. He walks into his new office at his new job at VMware, his first job, ready to start the day, eager to impress and meet his new co-workers.

Nobody was there. Thumbs twiddled.

10AM starts to roll around, and finally, the first sign of life. Over the next couple hours, more people show up.

Over the next week, he’s set up and learning the ropes. Working on his first bug, soon his first feature. Attending his first team get-togethers. Making his first Bay Area friends.

Over the next few months, his first birthday celebration at work. His first glass of champagne. His first real responsibilities.

Over the next few years, bigger roles, leadership roles. He began to get a feel for where he’s truly going in this silly little world.

This, of course, was me, on my first adventure in the tech industry.

I was lucky to be placed in a fantastic team full of smart, hard-working, dedicated, and fun software engineers and managers. We’d discuss architecture, brainstorm ideas, joke around, watch YouTube videos, play poker, watch movies, go to events. The web of awesome people extended throughout the company as well.

Over the past nine years, I worked on a great many things.

  • Eight releases of VMware Workstation, including a three-year effort to build Workstation 8.0 (a major undertaking).
  • VMware Server 1.0. I was the primary Linux developer, pulling caffeine-fueled all nighters to meet insane deadlines.
  • Player and VMRC, which powers the VM console for our enterprise products.
  • The core foundation used in Fusion and other products.
  • Icons and artwork for the Linux products.
  • I introduced Unity to Workstation. (Sorry, guys…)
  • Helped in the creation of the current generation of the View client for Linux.
  • More recently, I developed WSX, an experiment in developing a pure web client and console for accessing remote VMs anywhere, from desktops and tablets.

Not a bad run.

This Thursday, August 1st, 2013, I’ll be leaving VMware.

Revision 1: “Add the reviewboard”

Several years ago, I began working with my good friend David Trowbridge on an open source project for keeping track of patches and easing the review process. We spent many years in the open source world looking at raw diffs on bug trackers and in e-mails, and things weren’t that much better at VMware. As Mr. Wonderful says, “There has to be a better way!”

So we slaved away in the late nights and weekends, iterating and iterating until we had something we could use. We named this product “Review Board” (or “the reviewboard,” as our first commit says). We put it out there for people to play with, if anyone was interested.

There was interest. Review Board is now used around the world at companies big and small. We’ve continued to improve and grow the product and turn it into something that developers actually want to use.

We later built a startup around this. Beanbag.

It’s dangerous to go alone. Take this.

Earlier this year, we met a local entrepreneur as part of a program we participate in. We quickly developed a rapport, and he offered to help and advise us in our efforts to grow our business. It wasn’t long after that we started discussing funding, and where that could get us.

We started pitching, and he reached out to his contacts. Before long, we had what we needed to give this a try for a couple years.

Step 3: Profit?

There’s a lot of hard work ahead of us, but we’re up to the challenge. It’s both exciting and terrifying.

Leaving my team behind at VMware is hard, but everyone has been so supportive.

IMG_0720

Basically.

In the coming months, Review Board’s going to grow in exciting new ways. We’ll be gearing up for a new 1.8 release, releasing our first commercial extension to Review Board, and improving our SaaS, RBCommons. We have a pretty good idea where we want to go from here, and now we can better focus on making it happen.

It’s going to be an awesome adventure.

Weird bugs: Django, timezones, and importing from eggs

Every so often you hit a bug that makes you question your sanity. The past several days have been spent chasing one of the more confusing ones I’ve seen in a long time.

Review Board 1.7 added the ability to set the server-wide timezone. During development, we found problems using SSH with a non-default timezone. This only happened when updating os.environ[‘TZ’] to something other than our default of UTC. We’d see the SSH process (rbssh, our wrapper for SSH communication) break due to an EOF on stdin and stdout, and then we’d see the development server reload itself.

Odd.

Since this originated with a Subversion repository, I first suspected libsvn. I spent some time going through their code to see if a timezone update would break something. Perhaps timeout logic. That didn’t turn up anything interesting, but I couldn’t rule it out.

Other candidates for suspicion were rbssh itself, paramiko (the SSH library), Django, and the trickster god Loki. We just had too many moving pieces to know for sure.

So I wrote a little script to get in-between a calling process and another process and log all communication between them. I tested this with rbssh and with plain ol’ ssh. rbssh was the only one that broke. Strange, since it wasn’t doing anything obviously wrong, and it worked with the default timezone. Unless it was Paramiko somehow…

For the heck of it, I tried copying some of rbssh’s imports into this new script. Ah-ha! It dropped its streams when importing Paramiko, same as rbssh. Interesting. Time to dig into that code.

The base paramiko module imports a couple dozen other modules, so I started by narrowing it down and reducing imports until I found the common one that breaks things. Well that turned out to be a module that imported Crypto.Random. Replacing the paramiko import in my wrapper with Crypto.Random verified that that was the culprit.

Getting closer…

I rinsed and repeated with Crypto.Random, digging through the code and seeing what could have broken. Hmm, that code’s pretty straight-forward, but there are some native libraries in there. Well, all this is in a .egg file (not an extracted .egg directory), making it hard to look through, so I extracted it and replaced it with a .egg directory.

Woah! The problem went away!

I glance at the clock. 3AM. I’m not sure I can trust what I’m seeing anymore. Crypto.Random breaks rbssh, but only when installed as a .egg file and not a .egg directory. That made no sense, but I figured I’d deal with it in the morning.

My dreams that night were filled with people wearing “stdin” and “stdout” labels on their foreheads, not at all getting along.

Today, I considered just ripping out timezone support. I didn’t know what else to do. Though, since I’m apparently a bit of a masochist, I decided to look into this just a little bit more. And finally struck gold.

With my Django development server running, I opened up a separate, plain Python shell. In it, I typed “import Crypto.Random”. And suddenly saw my development server reload.

How could that happen, I wondered. I tried it again. Same result. And then… lightbulb!

Django reloads the dev server when modules change. Crypto is a self-contained .egg file with native files that must be extracted and added to the module path. Causing Django to reload. Causing it to drop the spawned rbssh process. Causing the streams to disconnect. Ah-ha. This had to be it.

One last piece of the puzzle. The timezone change.

I quickly located their autoreload code and pulled it up. Yep, it’s comparing modified timestamps. We have two processes with two different ideas of what the current timezone is (one UTC, one US/Pacific, in my case), meaning when rbssh launched and imported Crypto, we’d get a bunch of files extracted with US/Pacific-based timestamps and not UTC, triggering the autoreload.

Now that the world makes sense again, I can finally fix the problem!

All told, that was about 4 or 5 days of debugging. Certainly not the longest debugging session I’ve had, but easily one of the more confusing ones in a while. Yet in the end, it’s almost obvious.

Looking Back on Review Board

Just over 3.5 years ago, David Trowbridge and I spent some time discussing the annoyances of the typical patch submission and code review processes in the open source projects we participated in and at companies, and decided to play with some ideas for improving this. At the time, we knew very little about what we intended to do. We had a name for it pretty early on, but that was about all we had. We didn’t even know whether we’d get past an early prototyping stage. But here it is, over 3 years later, and we have the leading open source code review tool with an active support and development community, hundreds of companies using it, and exciting new innovations for aiding in the code review process.

I was thinking a few days ago about how far we’ve come and some of the decisions we made along the way. I went digging through our commit history in order to relive some of the past of our little project. Since so few people were even aware of Review Board’s existence at the time, I thought I’d share some of our history with you. Particularly the interesting and funny bits.

“Add the reviewboard.”

Commit #1. The very first thing we put in our Subversion tree on September 27, 2006. I don’t even remember what was in this change now. We transitioned to Git last year and this commit is now just plain empty. Maybe it was jut the directory structure? Who can say.

Early on, we didn’t refer to “Review Board” as a proper name. It was generally “the reviewboard” or something similar. The codebase was young. We didn’t actually do code review on the project at this point (and it shows!). The first few months are littered with odd or nonsensical commit messages, small breakages, and bad decisions.

A few of my favorite commit messages are:

  • “I suck. Make submitting of reviews.”
  • “Don’t stuff the list of files in the bug list. It’s impolite.”
  • “Avoid failing out with Christian’s wacko form”
  • “Gum.”
  • “Holy apple pancakes. It worked!”
  • “I suck… The array was empty… The tests never had a chance to fail. :(“
  • “‘This is a summary’ sucks. Now we use fortune for the summary, description, and testing done. ‘You’re ugly and your mother dresses you funny.'”
  • “Unbreak things before ChipX86 notices”
  • “I’m just… garhgh”

Nowadays, our commit messages look nothing like that, but that’s the fun of a new project. You get to go commit-crazy while you try to figure out what you’re building.

Dashboard, quips and fortunes

The UI of old looked quite different than the UI of today.

We had a dashboard from the very beginning (before the review request pages, even) but it wasn’t anything like the dashboard we had today. It was a simple page with a table containing all outgoing review requests and a table containing incoming review requests. But it also had one more thing: quips.

The beginning of quips functionality was being built. Quips are just little random quotes that are inserted in the UI. I think the plan was to put quips on certain pages, making Review Board a little more fun. We were using them in the dashboard for empty lists, with variations all saying something about the dashboard being empty. Quips are a neat feature that just never survived the early days of development.

Fortunes are similar. On Linux/Unix systems, there’s a little program called “fortune” that just displays a random quote. Since we at first had to test review request functionality without actually having a repository backend of any sort, and we didn’t want to input all the information each time, we just used fortune to generate the summary, description and testing done text. This made for some really funny review requests early on, but this is of course something that had no reason to survive initial development.

Sometimes we would create a bunch of review requests just to see what kind of quotes we’d get. 🙂

Multiple repositories? Almost didn’t happen.

One of the really critical parts of Review Board today is the ability to talk to a variety of different types of repositories in one instance. But, it turns out, this almost didn’t happen.

The initial goals were not that ambitious. Review Board talked to one repository per instance. Everything was basically hard-coded with one repository in mind. That type of repository, as well as its information, was customizable. You just couldn’t have more than one. At the time, this wasn’t a problem, but it didn’t take long until we had a need to talk to two repositories.

We discussed this and at first decided that if we needed to talk to two repositories, we could just set up two instances. It would have been a lot of work to update it for multiple repositories, after all. And really, this was a small project. Who would really need more than a couple repositories? This started to nag at me, though, and so I spent a couple nights rewriting all of the code as an experiment. It ended up working pretty nicely, and we were able to ditch the multi-instance model.

The importance of rewards

It’s always nice to have a little reward for milestones. Developers sometimes compete over cool bug numbers, revisions, etc. Initially, we were going to use quips to add some fun to the site, but we ended up settling on our current trophy system.

One of our first Review Board instances started to approach review request #1000, which was a huge milestone for us. I decided to commemorate the event by staying up and quickly hacking in a hidden feature for showing a trophy for review request #1000. The way we implemented it, you’d see the first ever trophy at 1,000, and from there you’d see it at every milestone number (1,000, 2,000, 3,000, 10,000, etc.). I didn’t want to stop there, though, so I added support for a second type of trophy, one that has confused people with its appearance to this day. Mission complete.

Of course, when we updated the server and someone finally hit 1000, it triggered a bug in the new trophy code and broke his review request. Oh well, I tried.

Diff viewers are hard

If I could pick one point during the whole history of Review Board where I was ready to completely give up, it would be during the creation of our diff viewer. All three diff viewers.

See, the first diff viewer was a complete and total hack. We generated a side-by-side diff using the diff tools and just parsed the output, basically generating a table of that. It was ugly, though, and limiting. It also caused problems where text on a row would either be truncated or would break the parser. I spent a long time working on this before I totally gave up and went on to try a new approach.

My second approach was closer to what we have today, but also limiting and very, very buggy. We were using Python’s built-in diff generation module, which implements a basic diff algorithm. It gave us insert and delete information, but not replace information. We had to hack that in ourselves, and it was really a hack. Try taking a bunch of inserts and deletes and find out which of those are really changed lines. No, really, try it. It’s harder than you think, and it’ll often be wrong.

Still, we stuck with this for a long time. It was slow, buggy, and didn’t generate the sort of output people expected from diff tools. Most people see diffs from GNU Diff, which implements the Meyers Diff Algorithm (with a few additions and tweaks). These Meyers diffs are much nicer to view than what Python gave us. Another problem we hit was that we didn’t have real line number information, so we had to output fake line numbers. They weren’t really line numbers so much as row numbers in the table. Ugh. Even getting this far was really hard and frustrating, and the result still wasn’t good.

Attempt #3. I decided to build our own diff parser and generator from scratch. What a project. I knew nothing about diff generation and hardly knew where to start. I spent probably a good month or so just trying to work on this new diff code, and was so close to giving up so many times. It ended up being completely worth it, though, as we ended up with a very nice, extensible diff parser.

Without that third attempt, we’d be in the stone age. Review Board would not be as nice to use. We wouldn’t have inter-line diffs (where we highlight what changed in a replace line), syntax highlighting, move detection (coming in 1.5), or function/class headers (where we show which function/class the part of the diff is in — also coming in 1.5).

What else…

Well, there’s a lot more I could talk about. Our initial attempts at JavaScript code for the UI, our trials and challenges with database migration, or our early problems storing diffs with different encodings in databases. This is getting long, though, so I’ll cover these in another post on lessons learned.

Review Board 1.0 Released!

Review Board 1.0

Tonight, we hit a milestone in the Review Board project that we’ve been working toward for over two years. We finally pushed out our 1.0 release. A lot of blood, sweat and tears went into this release (okay, so not literally, but it was A LOT OF WORK!). The last few months in particular have been challenging, as we’ve had to solve some tricky bugs and scalability problems, but the end result is pretty great.

Just a short while ago, we announced the release and put up an overview of the entire release and product. We’ve already had some nice congratulatory e-mails and tweets, which is really nice 🙂

Some stats for this release:

  • 2 years, 9 months, 25 days have passed since our first commit.
  • 120 contributors have contributed to Review Board so far (in terms of code contributions).
  • 2,019 commits were made.
  • 899 review requests have been posted to our project’s actual Review Board server. 1,650 users are registered on there.
  • Our demo server, in comparison, has 2,082 review requests filed and 10,154 users.
  • 938 bugs were filed. 812 were fixed.
  • 232 feature requests were filed. 101 were implemented. Most remaining ones are scheduled for releases.
  • An estimated 200+ companies are now using Review Board. 26 have let us list them publicly.
  • The largest known Review Board install has over 83,000 filed review requests and over 2,000 users, doing upwards of 10GB of traffic per day.
  • 5 presentations on Review Board are known to have been given, 3 by us, 2 by others.
  • 552 users have joined our main mailing list, and 3,674 e-mails have been sent.

Now that Review Board 1.0 is out, we can get started on some awesome new features we’ve had planned. I have a little notebook full of ideas for our 1.1 and 1.5 releases (which may become 1.5 and 2.0, respectively, as this list grows). Some of the new features are actually ready to be committed within the next couple of days, so those of you using nightlies will start to see them soon.

We were accepted into this year’s Summer of Code, and have three students working on exciting projects for us, so hopefully we’ll start to see these trickle into the upcoming nightlies as well. Among these projects include diff viewer improvements (moved region detection, better whitespace-only change detection), IDE integration with Eclipse, and improved notification hooks and e-mail support.

We’re also working on providing support for third-party extensions, which will allow developers to extend Review Board in new, exciting ways without having to modify Review Board itself. This is especially handy for companies who wish to integrate better with their sandboxes, bug trackers or unit testing services. This will likely land in 1.5 (2.0?) at the earliest, as it’s a large change, but the code for this mostly works today. It’s just a matter of getting the codebase ready and figuring out what APIs we want to stabilize and expose.

As I mentioned in the release announcement, we’re planning a release party, tentatively on July 11th, 2009, in the Bay Area (somewhere around Palo Alto, CA). If any Review Board users want to join us, please RSVP!

Review Board: Summer of Code, Roadmap and Future Plans

Summer of Code

This year, we (the Review Board project) was given the opportunity to participate in Google’s Summer of Code. We’ve received some great student proposals so far, and I think we’ll see exciting work done on Review Board this summer.

The deadline for Summer of Code is coming up fast (April 3rd, 19:00 UTC). If you’re interested in working on Review Board and haven’t yet applied, it’s not too late, but you’ll want to hurry. Skim through our ideas page and, if you find something interesting or have a great idea not listed here, then apply and tell us your plans. I can say we’ve received several proposals so far for the installer and admin UI, so unless you feel strongly about either of those, you’ll increase your chances with other proposals.

We’re also offering free Review Board hosting for open source projects participating in Summer of Code. If you’re a mentoring organization and would like to give Review Board a try for reviewing and managing student code, go ahead and contact us and we’ll get you set up.

Roadmap

We’re finally nearing 1.0. We recently put out our 1.0 beta 2 release and are now in a feature freeze. We’re working to get some bug, performance and usability fixes in for beta 3, which I’m shooting for in a few weeks. Then we’ll branch for 1.0, put out a Release Candidate or two, and then finally release 1.0!

There’s a lot of really cool features planned after 1.0, namely extensions and policy customization.

Extensions

Our bug tracker is filled with feature requests for all kinds of things, ranging from bug tracker integration, instant messaging, a method for offering bribes for code reviews, and so on. We clearly can’t put all the requested features in the codebase, so we’ve decided instead to add support for third-party extensions. Coming soon, developers will be able to write extensions to Review Board in the form of Python modules to extend or alter the functionality of Review Board. The extension framework will allow them to do the following:

  • Access the database using the existing Review Board database models.
  • Add new database models for storing data.
  • Listen for signals (new review request published, review request submitted, etc.) and act on them.
  • Add custom URLs.
  • Replace existing URLs, for advanced capabilities such as replacing the diff viewer.
  • Add new API handlers.
  • Add “action” links to existing review requests and reviews.
  • Add columns and sidebar entries to the dashboard.
  • Add pages to the administration UI.
  • Communicate with other extensions.
  • Provide a settings page, which stores data in Review Board-provided models (we even auto-generate the settings page for the extension by default).
  • And more!

A lot of this already exists in a private development branch, and it will be one of our primary focuses as soon as 1.0 goes out.

In time, we’ll add a new section to the Review Board website where developers can list their extensions for download and for sale. Administrators will be able to browse and search for extensions directly from the administration UI and install them without having to even open a terminal (in most cases).

We’re hoping this will solve a lot of in-house integration issues. For example, many companies have custom sandbox architectures, bug trackers, and statistics software which they’ll now be able to tie in with Review Board.

Policy Customization

We’ve found that a lot of companies have very specific ways they want to handle policy and access restrictions. For example, many companies want to limit who can see certain parts of a repository (and therefore certain diffs), or want to allow anybody to create review groups, or want to disallow people from joining review groups. Some also want to dictate what constitutes approval for submitting a change.

We’re looking into the various requests and attempting to come up with a policy model that is flexible enough to handle these needs. One of the ideas is to provide some basic level of access control on a per-repository, per-path, and per-group basis. We’d then piggy-back on the extension framework to allow for more specific policy control. The advantage is that developers could write their own policy rules that interface with some part of their company’s infrastructure.

If people have any input on this, we’d love to hear it.

Improving browser performance in Review Board

This past Sunday, I landed a set of changes into Review Board that provide improved performance, such as aggressive browser-side caching of media and pages. It’s just a start, but has already significantly reduced page load times in all of my tests, in some case by several seconds. We implemented these methods for Review Board, but they’re methods that can be applied to any Django project out there.

There are several key things that Review Board now does to improve performance:

  • Tells browsers to cache all media for one year.
  • Only sends page data if new data is available.
  • Compresses all media files to reduce transfer time.
  • Parallelizes media downloads.
  • Loads CSS files as early as possible.
  • Loads JavaScript files as late as possible.
  • Progressively loads expensive data.

A lot of the performance improvements come straight from Yahoo!’s Best Practices for Speeding Up Your Site. We’re not doing everything there yet, but we’re working toward it. That’s a great resource, by the way, and I recommend that everyone who has even made a website before go and read it.

So what do the above techniques buy us, and how are we doing them? Let me go into more details…

Caching all media for a year

The average site has one or more CSS files, JavaScript files, and several images. This translates to a lot of requests to the server, which may leave the site feeling slow. On top of this, a browser only makes a few requests to a server at a time, in order to avoid swamping the server, which will further hinder load times. This happens every time a user visits a page on your site.

Aggressive caching makes a huge difference and can greatly reduce load times for users. Review Board now tells the browser to cache media files for a year. Once a user downloads a JavaScript or CSS file, they won’t have to download it again, meaning that in general the only requests the browser needs to make is for the page requests and AJAX requests.

The big pitfall with long-term caching is that the cached resources can go stale. For example, if a new version of an image was uploaded, the browser wouldn’t even know about it, since it was told it should keep its old version for a year before checking again.

We solve this by introducing “media serials,” timestamps that are appended to all media paths. Instead of caching /js/myscript.js, the browser would cache /js/myscript.js?1273618736.

These media serials are computed on the first page request by our djblets.util.context_processors.ajaxSerial context processor. This quickly scans all media files known to the program, finding out the latest modification timestamp. It then provides a {{MEDIA_SERIAL}} variable for templates to append to media URLs as part of the query string.

The benefit to this method is that we can cache media files for a year and not worry about users having stale cached resources the next time we upgrade a copy of Review Board. The filenames requested will be different, browsers will see that the new file is not in the cache, and make a request, caching the new file for a year.

Only send page data if new data is available

Aggressive caching of media files is great and saves a lot of time, but it doesn’t help for dynamically generated content. For this, we need a new strategy.

When a browser makes a request, it can send a If-Modified-Since header to the server containing the Last-Modified value it received the last time it downloaded that page. This is a very valuable header, and there’s some things we can do with it to save both the server and the browser a lot of trouble.

If the browser sends If-Modified-Since, and we know that no new data has been generated since the timestamp provided, we can send an HttpResponseNotModified (HTTP response code 304). This will tell the browser it already has the newest version of the page. The sooner we do this, the better, as it means we don’t have to waste time building templates or doing any expensive database queries.

Djblets, once again, provides some functions to help out here: djblets.util.http.set_last_modified and djblets.util.http.get_modified_since.

The general usage pattern is that we first build a timestamp representing the latest version of the page. This could be the timestamp for a particular object that the page represents. We then check if we can bail early by calling:

if get_modified_since(request, timestamp):
    return HttpResponseNotModified()

Further down, after building the page, we must set the Last-Modified timestamp, using the same timestamp as above, like so:

set_last_modified(response, timestamp)

We’re using this in only a few places right now, such as the review request details page, but it drastically improves load times. If the review request hasn’t changed and nobody’s commented on it since the browser last loaded the page, a reload of the page will be almost instant.

Compress all media files

Our Apache and lighttpd config files now enable compression by default. By compressing these files, we can turn a relatively large JavaScript file (such as the jquery and jquery-ui files) into a very small file before sending it over to the browser. This reduces transfer times at the expense of compression/decompression time (which is small enough to not worry for deployments of this size, and can be offset by caching of compressed files server-side).

Parallelize media downloads

It’s important to not mix loads of media files of different types. The browser parallelizes media downloads of the same type, in page load order, but if you load one CSS file, one JavaScript file, another CSS file, and then another JavaScript file, the browser will only attempt one load at a time. If you load all the CSS files before all JavaScript files, it will parallelize the CSS file download and then the JavaScript downloads. By enforcing the separation of loads, we can achieve faster page download/render times.

Load CSS files as soon as possible

Loading CSS files before the browser starts to display the page can make the page appear to load smoother. The browser will already know how things should look and will lay the page out accordingly, instead of laying the page out once and then updating that once the CSS files have loaded.

Load JavaScript files as late as possible

JavaScript loads block the browser, as the browser must parse and interpet the JavaScript before it can continue. Sometimes it’s necessary to load a JavaScript file early, but in many cases the files can be loaded late. When possible, we load JavaScript files at the very end of the document body so that they won’t even begin downloading until the page has rendered. This provides noticeable performance for script-heavy pages.

Progressively load expensive data

There are types of data that are just too expensive to load along with the rest of the page. For a long time, Review Board would parse and render fragments of a diff for display in the review request page, but that meant that before the page could load, Review Board would need to do the following:

  1. Query the list of all comments.
  2. Fetch every file commented on.
  3. Apply the stored patch to each file.
  4. Diff between the original and patched files.
  5. Render the portion of the diff commented on into the page.

This became very time-consuming, and if a server was down, the page wasn’t available until everything timed out. The solution to this was to lazily load each of these diff fragments in order.

We now display a placeholder table for each diff fragment in roughly the same size of the rendered fragment (to avoid excessive page scrolling on loads). The table contains a spinner showing that something is happening, and, one-by-one (to avoid dogpiling) we load each diff fragment.

The code to render the diff fragment, by the way, takes advantage of the If-Modified-Since header and is also cached for a year. We use an AJAX_SERIAL (same principal as the MEDIA_SERIAL above) to allow for changes in new deployments.

With these caching mechanisms in place, the review request page now loads in roughly a second in many cases (or less once cached), with diff fragments coming in lazily (and then almost immediately on future loads).

More to come…

This was a great first step, but there’s more we can do. Before we hit our 1.0 release, we’re going to batch together all our CSS files and JavaScript files into a couple of combined files and then “minify” them (basically compressing them in such a way to allow for smaller files and faster load times of the interpreted data).

Again, these are techniques we’re now making use of in Review Board, but they’re not in any way specific to Review Board. Anyone out there developing websites or web applications should seriously look into ways to improve performance. I hope this was a good starting point, but seriously, do read Yahoo!’s article as well.

Review Board 1.0 alpha 1 released

Roughly two years ago, David Trowbridge and I began development of Review Board for use in our open source projects and our team at VMware. During that time, we’ve turned Review Board into a powerful code review tool that works with a variety of version control systems. Most of VMware has moved over to it, as have an estimated 50-100 companies world-wide. We’ve had over 100 contributors to the project, people providing volunteer support on the mailing list, and people have developed third party tools for integrating with Review Board.

After all this time in development, with this many people contributing, we decided it’s probably time to get a release out there. Sure, we could have done this a long time ago, but there’s a number of large things we were hoping to get in (a recently-committed UI rewrite, for instance). Now that we have most of the major features we want for our 1.0 release, we decided it was time for an alpha.

Over the coming months, we’ll be working on stabilizing the codebase, fixing a few large remaining usability quirks, enhancing performance, and writing some proper documentation (which is coming along nicely).

We’re eager to get a quality product out there and to begin development on the next release. There’s a lot of neat things planned:

  • Support for writing extensions to Review Board.
  • A fully-featured API covering every operation you’ll need to perform.
  • Some degree of policy support (specifying which users/groups can see which parts of a repository, for instance).
  • Reviews with statuses other than “Ship It”. This will probably be customizable to some degree.
  • Possibly some theme customization to allow Review Board to blend in better with corporate sites, Trac installs, etc.

Along with this, I plan to roll out a new website for the project that will have a browseable list of third party extensions, apps, Greasemonkey scripts, and more.

We have more information on our release on our release announcement.

Twittering as Review Board Approaches 1.0

On the road to 1.0

We’re getting very close to feature freeze for Review Board 1.0. The last couple of major features are up for review. These consist largely of a UI rewrite that simplifies a lot of Review Board’s operations and moves us over to using jQuery. This will go in once it’s been reviewed and tested in Firefox 3, IE 6/7, Opera and Safari.

There are some preliminary screenshots up of the UI rewrite. Some things will be changing before this goes in, but it should give a good idea as to the major changes (if you’re already a Review Board user).

In the meantime, we’re working to get some other fixes and small features in, and I’m beginning work on a user manual. I’m not sure how much will get done for 1.0, but with any luck I’ll have a decent chunk done.

Twittering the night away

I’ve just set up a @reviewboard user on Twitter that I’m going to try to keep up-to-date as progress is made. This should give people a decent way of passively keeping track of updates if they’re Twitter users.

Barter system?

Britt Selvitelle of Twitter fame just sent me a great screenshot of a barter system for Review Board. Can’t get someone to review your code? Offer them something in exchange!

As some people know, we’re planning to have extensions in the next major release (1.5 or 2.0). This would be a fun little extension to have 🙂 Maybe I’ll write it as part of a tutorial.

Scroll to Top