Python

Using pprint.saferepr() for string comparisons in Python

Python offers a handy module called pprint, which has helpers for formatting and printing data in a nicely-formatted way. If you haven’t used this, you should definitely explore it!

Most people will reach for this module when using pprint.pprint() (aka pprint.pp()) or pprint.pformat(), but one under-appreciated method is pprint.saferepr(), which advertises itself as returning a string representation of an object that is safe from some recursion issues. They demonstrate this as:

>>> import pprint
>>> stuff = ['spam', 'eggs', 'lumberjack', 'knights', 'ni']
>>> stuff.insert(0, stuff)
>>> pprint.saferepr(stuff)
"[<Recursion on list with id=4370731904>, 'spam', 'eggs', 'lumberjack', 'knights', 'ni']"

But actually repr() handles this specific case just fine:

>>> repr(stuff)
"[[...], 'spam', 'eggs', 'lumberjack', 'knights', 'ni']"

Pretty minimal difference there. Is there another reason we might care about saferepr()?

Dictionary string comparisons!

saferepr() first sets up the pprint pretty-printing machinery, meaning it takes care of things like sorting keys in dictionaries.

This is great! This is really handy when writing unit tests that need to compare, say, logging of data.

See, modern versions of CPython 3 will preserve insertion order of keys into dictionaries, which can be nice but aren’t always great for comparison purposes. If you’re writing a unit test, you probably want to feel confident about the string you’re comparing against.

Let’s take a look at repr() vs. saferepr() with a basic dictionary.

>>> d = {}
>>> d['z'] = 1
>>> d['a'] = 2
>>> d['g'] = 3
>>>
>>> repr(d)
"{'z': 1, 'a': 2, 'g': 3}"
>>>
>>> from pprint import saferepr
>>> saferepr(d)
"{'a': 2, 'g': 3, 'z': 1}"

A nice, stable order. Now we don’t have to worry about a unit test breaking on different versions or implementations or with different logic.

saferepr() will disable some of pprint()`’s typical output limitations. There’s no max dictionary depth. There’s no max line width. You just get a nice, normalized string of data for comparison purposes.

But not for sets…

Okay, it’s not perfect. Sets will still use their standard representation (which seems to be iteration order), and this might be surprising given how dictionaries are handled.

Interestingly, pprint() and pformat() will sort keys if it needs to break them across multiple lines, but otherwise, nope.

For example:

>>> from random import choices
>>> import pprint
>>> import string
>>> s = set(choices(string.ascii_letters, k=25))
>>> repr(s)
"{'A', 'D', 'b', 'J', 'T', 'X', 'C', 'V', 'e', 'v', 'I', 'w', 'H', 'k', 'M', 't', 'U', 'o', 'W'}"
>>> pprint.saferepr(s)
"{'A', 'D', 'b', 'J', 'T', 'X', 'C', 'V', 'e', 'v', 'I', 'w', 'H', 'k', 'M', 't', 'U', 'o', 'W'}"
>>> pprint.pp(s)
{'A',
 'C',
 'D',
 'H',
 'I',
 'J',
 'M',
 'T',
 'U',
 'V',
 'W',
 'X',
 'b',
 'e',
 'k',
 'o',
 't',
 'v',
 'w'}

Ah well, can’t have everything we want.

Still, depending on what you’re comparing, this can be a very handy tool in your Python Standard Library toolset.

Using pprint.saferepr() for string comparisons in Python Read More »

A crash course on Python function signatures and typing

Screenshot of Python code defining some example Python types for functions

I’ve been doing some work on Review Board and our utility library Typelets, and thought I’d share some of the intricacies of function signatures and their typing in Python.

We have some pretty neat Python typing utilities in the works to help a function inherit another function’s types in their own *args and **kwargs without rewriting a TypedDict. Useful for functions that need to forward arguments to another function. I’ll talk more about that later, but understanding how it works first requires understanding a bit about how Python sees functions.

Function signatures

Python’s inspect module is full of goodies for analyzing objects and code, and today we’ll explore the inspect.signature() function.

inspect.signature() is used to introspect the signature of another function, showing its parameters, default values, type annotations, and more. It can also aid IDEs in understanding a function signature. If a function has __signature__ defined, it will reference this (at least in CPython), and this gives highly-dynamic code the ability to patch signatures at runtime. (If you’re curious, here’s exactly how it works).

There are a few places where knowing the signature can be useful, such as:

  • Automatically-crafting documentation (Sphinx does this)
  • Checking if a callback handler accepts the right arguments
  • Checking if an implementation of an interface is using deprecated function signatures

Let’s set up a function and take a look at its signature.

>>> def my_func(
...     a: int,
...     b: str,
...     /,
...     c: dict[str, str] | None,
...     *,
...     d: bool = False,
...     **kwargs,
... ) -> str:
...     ...
... 
>>> import inspect
>>> sig = inspect.signature(my_func)

>>> sig
<Signature (a: int, b: str, /, c: dict[str, str] | None, *,
d: bool = False, **kwargs) -> str>

>>> sig.parameters
mappingproxy(OrderedDict({
    'a': <Parameter "a: int">,
    'b': <Parameter "b: str">,
    'c': <Parameter "c: dict[str, str] | None">,
    'd': <Parameter "d: bool = False">,
    'kwargs': <Parameter "**kwargs">
}))

>>> sig.return_annotation
<class 'str'>

>>> sig.parameters.get('c')
<Parameter "c: dict[str, str] | None">

>>> 'kwargs' in sig.parameters
True

>>> 'foo' in sig.parameters
False

Pretty neat. Pretty useful, when you need to know what a function takes and returns.

Let’s see what happens when we work with methods on classes.

>>> class MyClass:
...     def my_method(
...         self,
...         *args,
...         **kwargs,
...     ) -> None:
...         ...
...
>>> inspect.signature(MyClass.my_method)
<Signature (self, *args, **kwargs) -> None>

Seems reasonable. But…

>>> obj = MyClass()
>>> inspect.signature(obj.my_method)
<Signature (*args, **kwargs) -> None>

self disappeared!

What happens if we do this on a classmethod? Place your bets…

>>> class MyClass2:
...     @classmethod
...     def my_method(
...         cls,
...         *args,
...         **kwargs,
...     ) -> None:
...         ...
...
>>> inspect.signature(MyClass2.my_method)
<Signature (*args, **kwargs) -> None>

If you guessed it wouldn’t have cls, you’d be right.

Only unbound methods (definitions of methods on a class) will have a self parameter in the signature. Bound methods (callable methods bound to an instance of a class) and classmethods (callable methods bound to a class) don’t. And this makes sense, if you think about it, because this signature represents what the call accepts, not what the code defining the method looks like.

You don’t pass a self when calling a method on an object, or a cls when calling a classmethod, so it doesn’t appear in the function signature. But did you know that you can call an unbound method if you provide an object as the self parameter? Watch this:

>>> class MyClass:
...     def my_method(self) -> None:
...         self.x = 42
...
>>> obj = MyClass()
>>> MyClass.my_method(obj)
>>> obj.x
42

In this case, the unbound method MyClass.my_method has a self argument in its signature, meaning it takes it in a call. So, we can just pass in an instance. There aren’t a lot of cases where you’d want to go this route, but it’s helpful to know how this works.

What are bound and unbound methods?

I briefly touched upon this, but:

  • Unbound methods are just functions. Functions that are defined on a class.
  • Bound methods are a function where the very first argument (self or cls) is bound to a value.

Binding normally happens when you instantiate a class, but you can do it yourself through any function’s __get__():

>>> def my_func(self):
...     print('I am', self)
... 
>>> class MyClass:
...     ...
...
>>> obj = MyClass()
>>> method = my_func.__get__(MyClass)
>>> method
<bound method my_func of <__main__.MyClass object at 0x100ea20d0>>

>>> method.__self__
<__main__.MyClass object at 0x100ea20d0>

>>> inspect.ismethod(method)
True

>>> method()
I am <__main__.MyClass object at 0x100ea20d0>

my_func wasn’t even defined on a class, and yet we could still make it a bound method tied to an instance of MyClass.

You can think of a bound method as a convenience over having to pass in an object as the first parameter every time you want to call the function. As we saw above, we can do exactly that, if we pass it to the unbound method, but binding saves us from doing this every time.

You’ll probably never need to do this trick yourself, but it’s helpful to know how it all ties together.

By the way, @staticmethod is a way of telling Python to never make an unbound method into a bound method when instantiating the object (it stays a function), and @classmethod is a way of telling Python to bind it immediately to the class it’s defined on (and not rebind when instantiating an object from the class).

How do you tell them apart?

If you have a function, and you don’t know if it’s a standard function, a classmethod, a bound method, or an unbound method, how can you tell?

  1. Bound methods have a __self__ attribute pointing to the parent object (and inspect.ismethod() will be True).
  2. Classmethods have a __self__ attribute pointing to the parent class (and inspect.ismethod() will be True).
  3. Unbound methods are tricky:
    • They do not have a __self__ attribute.
    • They might have a self or cls parameter in the signature, but they might not have those names (and other functions may define them).
    • They should have a . in its __qualname__ attribute. This is a full .-based path to the method, relative to the module.
    • Splitting __qualname__, the last component would be the name. The previous component won’t be <locals> (but if <locals> is found, you’re going to have trouble getting to the method).
    • If the full path is resolvable, the parent component should be a class (but it might not be).
    • You could… resolve the parent module to a file and walk its AST and find the class and method based on __qualname__. But this is expensive and probably a bad idea for most cases.
  4. Standard functions are the fallback.

Since unbound methods are standard functions that are just defined on a class, it’s difficult to really tell the difference. You’d have to rely on heuristics and know you won’t always get a definitive answer.

(Interesting note: In Python 2, unbound methods were special kinds of functions with a __self__ that pointed to the class they were defined on, so you could easily tell!)

The challenges of typing

Functions can be typed using Callable[[<param_types>], <return_type>].

This is very simplistic, and can’t be used to represent positional-only arguments, keyword-only arguments, default arguments, *args, or **kwargs. For that, you can define a Protocol with __call__:

from typing import Protocol


class MyCallback(Protocol):
    def __call__(
        self,  # This won't be part of the function signature
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        ...

Type checkers will then treat this as a Callable, effectively. If we take my_func from the top of this post, we can assign to it:

cb: MyCallback = my_func  # This works

What if we want to assign a method from a class? Let’s try bound and unbound.

class MyClass:
    def my_func(
        self,
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        return '42'

cb2: MyCallback = MyClass.my_func  # This fails
cb3: MyCallback = MyClass().my_func  # This works

What happened? It’s that self again. Remember, the unbound method has self in the signature, and the bound method does not.

Let’s add self and try again.

class MyCallback(Protocol):
    def __call__(
        _proto_self,  # This won't be part of the function signature
        self: Any,
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        ...

cb2: MyCallback = MyClass.my_func    # This works
cb3: MyCallback = MyClass().my_func  # This fails

What happened this time?! Well, now we’ve matched the unbound signature with self, but not the bound signature without it.

Solving this gets… verbose. We can create two versions of this: Unbound, and Bound (or plain function, or classmethod):

class MyUnboundCallback(Protocol):
    def __call__(
        _proto_self,  # This won't be part of the function signature
        self: Any,
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        ...

class MyCallback(Protocol):
    def __call__(
        _proto_self,  # This won't be part of the function signature
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        ...

# These work
cb4: MyCallback = my_func
cb5: MyCallback = MyClass().my_func


# These fail correctly
cb7: MyUnboundCallback = my_func
cb8: MyUnboundCallback = MyClass().my_func
cb9: MyCallback = MyClass.my_func

This means we can use union types (MyUnboundCallback | MyCallback) to cover our bases.

It’s not flawless. Depending on how you’ve typed your signature, and the signature of the function you’re setting, you might not get the behavior you want or expect. As an example, any method with a leading self-like parameter (basically any parameter coming before your defined signature) will type as MyUnboundCallback, because it might be! Remember, we can turn any function into a bound method for an arbitrary class using __get__. That may or may not matter, depending on what you need to do.

What do I mean by that?

def my_bindable_func(
    x,
    a: int,
    b: str,
    /,
    c: dict[str, str] | None,
    *,
    d: bool = False,
    **kwargs,
) -> str:
    return ''


x1: MyCallback = my_bindable_func         # This fails
x2: MyUnboundCallback = my_bindable_func  # This works

x may not be named self, but it’ll get treated as one, because if we do my_bindable_func.__get__(some_obj), then some_obj will be bound to x and callers won’t have to pass anything to x.

Okay, what if you want to return a function that can behave as an unbound method (with self) that can become an unbound method (with __get__)? We can mostly do it with:

from typing import ParamSpec, TypeVar, cast, overload


_C = TypeVar('_C')
_R_co = TypeVar('_R_co', covariant=True)
_P = ParamSpec('_P')


class MyMethod(Protocol[_C, _P, _R_co]):
    __self__: _C

    @overload
    def __get__(
        self,
        instance: None,
        owner: type[_C],
    ) -> Callable[Concatenate[_C, _P], _R_co]:
        ...

    @overload
    def __get__(
        self,
        instance: _C,
        owner: type[_C],
    ) -> Callable[_P, _R_co]:
        ...

    def __get__(
        self,
        instance: _C | None,
        owner: type[_C],
    ) -> (
        Callable[Concatenate[_C, _P], _R_co] |
        Callable[_P, _R_co]
    ):
        ...

Putting it into practice:

def make_method(
    source_method: Callable[Concatenate[_C, _P], _R_co],
) -> MyMethod[_C, _P, _R_co]:
    return cast(MyMethod, source_method)


class MyClass2:
    @make_method
    def my_method(
        self,
        a: int,
        b: str,
        /,
        c: dict[str, str] | None,
        *,
        d: bool = False,
        **kwargs,
    ) -> str:
        return True


# These work!
MyClass2().my_method(1, 'x', {}, d=True)
MyClass2.my_method(MyClass2(), 1, 'x', {}, d=True)

That’s a fair bit of work, but it satisfies the bound vs. unbound methods signature differences. If we inspect these:

>>> reveal_type(MyClass2.my_method)
Type of "MyClass2.my_method" is "(MyClass2, a: int, b: str, /,
c: dict[str, str] | None, *, d: bool = False, **kwargs: Unknown) -> str"

>>> reveal_type(MyClass2().my_method)
Type of "MyClass2().my_method" is "(a: int, b: str, /,
c: dict[str, str] | None, *, d: bool = False, **kwargs: Unknown) -> str"

And those are type-compatible with the MyCallback and MyUnboundCallback we built earlier, since the signatures match:

# These work
cb10: MyUnboundCallback = MyClass.my_method
cb11: MyCallback = MyClass().my_method

# These fail correctly
cb12: MyUnboundCallback = MyClass().my_method
cb13: MyCallback = MyClass.my_method

And if we wanted, we could modify that ParamSpec going into the MyMethod from make_method() and that’ll impact what the type checkers expect during the call.

Hopefully you can see how this can get complex fast, and involve some tradeoffs.

I personally believe Python needs a lot more love in this area. Types for the different kinds of functions/methods, better specialization for Callable, and some of the useful capabilities from TypeScript would be nice (such as Parameters<T>, ReturnType<T>, OmitThisParameter<T>, etc.). But this is what we have to work with today.

My teachers said to always write a conclusion

What have we learned?

  • Python’s method signatures are different when bound vs. unbound, and this can affect typing.
  • Unbound methods aren’t really their own thing, and this can lead to some challenges.
  • Any method can be a bound method with a call to __get__().
  • Callable only gets you so far. If you want to type complex functions, write a Protocol with a __call__() signature.
  • If you want to simulate a bound/unbound-aware type, you’ll need Protocol with __get__().

I feel like I just barely scratched the surface here. There’s a lot more to functions, working with signatures, and challenges around typing than I covered here. We haven’t talked about how you can rewrite signatures on the fly, how annotations are represented, what functions look under the hood, or how bytecode behind functions are mutable at runtime.

I’ll leave some of that for future posts. And I’ll have more to talk about when we expand Typelets with the new parameter inheritance capabilities. It builds upon a lot of what I covered today to perform some very neat and practical tricks for library authors and larger codebases.

What do you think? Did you learn something? Did I get something wrong? Have you done anything interesting or unexpected with signatures or typing around functions you’d like to share? I want to hear about it!

A crash course on Python function signatures and typing Read More »

Tip: Use keyword-only arguments in Python dataclasses

Python dataclasses are a really nice feature for constructing classes that primarily hold or work with data. They can be a good alternative to using dictionaries, since they allow you to add methods, dynamic properties, and subclasses. They can also be a good alternative to building your own class by hand, since they don’t need a custom __init__() that reassigns attributes and provide methods like __eq__() out of the box.

One small tip to keeping dataclasses maintainable is to always construct them with kw_only=True, like so:

from dataclasses import dataclass


@dataclass(kw_only=True)
class MyDataClass:
    x: int
    y: str
    z: bool = True

This will construct an __init__() that looks like this:

class MyDataClass:
    def __init__(
        self,
        *,
        x: int,
        y: str,
        z: bool = True,
    ) -> None:
        self.x = x
        self.y = y
        self.z = z

Instead of:

class MyDataClass:
    def __init__(
        self,
        x: int,
        y: str,
        z: bool = True,
    ) -> None:
        self.x = x
        self.y = y
        self.z = z

That * in the argument list means everything that follows must be passed as a keyword argument, instead of a positional argument.

There are two reasons you probably want to do this:

  1. It allows you to reorder the fields on the dataclass without breaking callers. Positional arguments means a caller can use MyDataClass(1, 'foo', False), and if you remove/reorder any of these arguments, you’ll break those callers unexpectedly. By forcing callers to use MyDataClass(x=1, y='foo', z=False), you remove this risk.
  2. It allows subclasses to add required fields. Normally, any field with a default value (like z above) will force any fields following it to also have a default. And that includes all fields defined by subclasses. Using kw_only=True gives subclasses the flexibility to decide for themselves which fields must be provided by the caller and which have a default.

These reasons are more important for library authors than anything. We spend a lot of time trying to ensure backwards-compatibility and forwards-extensibility in Review Board, so this is an important topic for us. And if you’re developing something reusable with dataclasses, it might be for you, too.

Update: One important point I left out is Python compatibility. This flag was introduced in Python 3.10, so if you’re supporting older versions, you won’t be able to use this just yet. If you want to optimistically enable this just on 3.10+, one approach would be:

import sys
from dataclasses import dataclass


if sys.version_info[:2] >= (3, 10):
    dataclass_kwargs = {
        'kw_only': True,
    }
else:
    dataclass_kwargs = {}

...

@dataclass(**dataclass_kwargs)
class MyDataClass:
    ...
...

But this won’t solve the subclassing issue, so you’d still need to ensure any subclasses use default arguments if you want to support versions prior to 3.10.

Tip: Use keyword-only arguments in Python dataclasses Read More »

Peer-Programming a Buggy World with ChatGPT AI

AI has been all the rage lately, with solutions like Stable Diffusion for image generation, GPT-3 for text generation, and CoPilot for code development becoming publicly available to the masses.

That excitement ramped up this week with the release of ChatGPT, an extremely impressive chat-based AI system leveraging the best GPT has to offer.

I decided last night to take ChatGPT for a spin, to test its code-generation capabilities. And I was astonished by the experience.

Together, we built a simulation of bugs foraging for food in a 100×100 grid world, tracking essentials like hunger and life, reproducing, and dealing with hardships involving seasonal changes, natural disasters, and predators. All graphically represented.

We’re going to explore this in detail, but I want to start off by showing you what we built:

Also, you can find out more on my GitHub repository

A Recap of my Experience

Before we dive into the collaborative sessions that resulted in a working simulation, let me share a few thoughts and tidbits about my experience:

Peer-Programming a Buggy World with ChatGPT AI Read More »

Integration and Simulation Tests in Python

One of my (many) tasks lately has been to rework unit and integration tests for Review Bot, our automated code review add-on for Review Board.

The challenge was providing a test suite that could test against real-world tools, but not require them. An ever-increasing list of compatible tools has threatened to become an ever-increasing burden on contributors. We wanted to solve that.

So here’s how we’re doing it.

First off, unit test tooling

First off, this is all Python code, which you can find on the Review Bot repository on GitHub.

We make heavy use of kgb, a package we’ve written to add function spies to Python unit tests. This goes far beyond Mock, allowing nearly any function to be spied on without having to be replaced. This module is a key component to our solution, given our codebase and our needs, but it’s an implementation detail — it isn’t a requirement for the overall approach.

Still, if you’re writing complex Python test suites, check out kgb.

Deciding on the test strategy

Review Bot can talk to many command line tools, which are used to perform checks and audits on code. Some are harder than others to install, or at least annoying to install.

We decided there’s two types of tests we need:

  1. Integration tests — ran against real command line tools
  2. Simulation tests — ran against simulated output/results that would normally come from a command line tool

Being that our goal is to ease contribution, we have to keep in mind that we can’t err too far on that side at the expense of a reliable test suite.

We decided to make these the same tests.

The strategy, therefore, would be this:

  1. Each test would contain common logic for integration and simulation tests. A test would set up state, perform the tool run, and then check results.
  2. Integration tests would build upon this by checking dependencies and applying configuration before the test run.
  3. Simulation tests would be passed fake output or setup data needed to simulate that tool.

This would be done without any code duplication between integration or simulation tests. There would be only one test function per expectation (e.g., a successful result or the handling of an error). We don’t want to worry about tests getting out of sync.

Regression in our code? Both types of tests should catch it.

Regression or change in behavior in an integrated tool? Any fixes we apply would update or build upon the simulation.

Regression in the simulation? Something went wrong, and we caught it early without having to run the integration test.

Making this all happen

We introduced three core testing components:

  1. @integration_test() — a decorator that defines and provides dependencies and input for an integration test
  2. @simulation_test() — a decorator that defines and provides output and results for a simulation test
  3. ToolTestCaseMetaClass — a metaclass that ties it all together

Any test class that needs to run integration and simulation tests will use ToolTestCaseMetaClass and then apply either or both @integration_test/@simulation_test decorators to the necessary test functions.

When a decorator is applied, the test function is opted into that type of test. Data can be passed into the decorator, which is then passed into the parent test class’s setup_integration_test() or setup_simulation_test().

These can do whatever they need to set up that particular type of test. For example:

  • Integration test setup defaults to checking dependencies, skipping a test if not met.
  • Simulation test setup may write some files or spy on a subprocess.Popen() call to fake output.


For example:

class MyTests(kgb.SpyAgency, TestCase,
              metaclass=ToolTestCaseMetaClass):
    def setup_simulation_test(self, output):
        self.spy_on(execute, op=kgb.SpyOpReturn(output))

    def setup_integration_test(self, exe_deps):
        if not are_deps_found(exe_deps):
            raise SkipTest('Missing one or more dependencies')

    @integration_test(exe_deps=['mytool'])
    @simulation_test(output=(
        b'MyTool 1.2.3\n'
        b'Scanning code...\n'
        b'0 errors, 0 warnings, 1 file(s) checked\n'
    ))
    def test_execute(self):
        """Testing MyTool.execute"""
        ...

When applied, ToolTestCaseMetaClass will loop through each of the test_*() functions with these decorators applied and split them up:

  • Test functions with @integration_test will be split out into a test_integration_<name>() function, with a [integration test] suffix appended to the docstring.
  • Test functions with @simulation_test will be split out into test_simulation_<name>(), with a [simulation test] suffix appended.

The above code ends up being equivalent to:

class MyTests(kgb.SpyAgency, TestCase):
    def setup_simulation_test(self, output):
        self.spy_on(execute, op=kgb.SpyOpReturn(output))

    def setup_integration_test(self, exe_deps):
        if not are_deps_found(exe_deps):
            raise SkipTest('Missing one or more dependencies')

    def test_integration_execute(self):
        """Testing MyTool.execute [integration test]"""
        self.setup_integration_test(exe_deps=['mytool'])
        self._test_common_execute()

    def test_simulation_execute(self):
        """Testing MyTool.execute [simulation test]"""
        self.setup_simulation_test(output=(
            b'MyTool 1.2.3\n'
            b'Scanning code...\n'
            b'0 errors, 0 warnings, 1 file(s) checked\n'
        ))
        self._test_common_execute()

    def _test_common_execute(self):
        ...

Pretty similar, but less to maintain in the end, especially as tests pile up.

And when we run it, we get something like:

Testing MyTool.execute [integration test] ... ok
Testing MyTool.execute [simulation test] ... ok

...

Or, you know, with a horrible, messy error.

Iterating on tests

It’s become really easy to maintain and run these tests.

We can now start by writing the integration test, modify the code to log any data that might be produced by the command line tool, and then fake-fail the test to see that output.

class MyTests(kgb.SpyAgency, TestCase,
              metaclass=ToolTestCaseMetaClass):
    ...

    @integration_test(exe_deps=['mytool'])
    def test_process_results(self):
        """Testing MyTool.process_results"""
        self.setup_files({
            'filename': 'test.c',
            'content': b'int main() {return "test";}\n',
        })

        tool = MyTool()
        payload = tool.run(files=['test.c'])

        # XXX
        print(repr(payload))

        results = MyTool().process_results(payload)

        self.assertEqual(results, {
            ...
        })

        # XXX Fake-fail the test
        assert False

I can run that and get the results I’ve printed:

======================================================================
ERROR: Testing MyTool.process_results [integration test]
----------------------------------------------------------------------
Traceback (most recent call last):
    ...
-------------------- >> begin captured stdout << ---------------------
{"errors": [{"code": 123, "column": 13, "filename": "test.c", "line': 1, "message": "Expected return type: int"}]}

Now that I have that, and I know it’s all working right, I can feed that output into the simulation test and clean things up:

class MyTests(kgb.SpyAgency, TestCase,
              metaclass=ToolTestCaseMetaClass):
    ...

    @integration_test(exe_deps=['mytool'])
    @simulation_test(output=json.dumps(
        'errors': [
            {
                'filename': 'test.c',
                'code': 123,
                'line': 1,
                'column': 13,
                'message': 'Expected return type: int',
            },
        ]
    ).encode('utf-8'))
    def test_process_results(self):
        """Testing MyTool.process_results"""
        self.setup_files({
            'filename': 'test.c',
            'content': b'int main() {return "test";}\n',
        })

        tool = MyTool()
        payload = tool.run(files=['test.c'])
        results = MyTool().process_results(payload)

        self.assertEqual(results, {
            ...
        })

Once it’s running correctly in both tests, our job is done.

From then on, anyone working on this code can just simply run the test suite and make sure their change hasn’t broken any simulation tests. If it has, and it wasn’t intentional, they’ll have a great starting point in diagnosing their issue, without having to install anything.

Anything that passes simulation tests can be considered a valid contribution. We can then test against the real tools ourselves before landing a change.

Development is made simpler, and there’s no worry about regressions.

Going forward

We’re planning to apply this same approach to both Review Board and RBTools. Both currently require contributors to install a handful of command line tools or optional Python modules to make sure they haven’t broken anything, and that’s a bottleneck.

In the future, we’re looking at making use of python-nose‘s attrib plugin, tagging integration and simulation tests and making it trivially easy to run just the suites you want.

We’re also considering pulling out the metaclass and decorators into a small, reusable Python packaging, making it easy for others to make use of this pattern.

Integration and Simulation Tests in Python Read More »

Django Development with Djblets: Dynamic Site Configuration

Django’s a powerful toolkit and offers many things to ease the creation of websites and web applications. Features such as the automatically built Administration UI are just awesome and save developers from having to reinvent the wheel on every project. It does fall short in a few areas, however, and one of them is site configuration.

Now, Django’s settings support isn’t bad. You get a nice little settings.py file to put settings in for your site/webapp, and since it’s just a Python module, your settings can really hold whatever data you want and can be well documented with comments. That’s all great, but when you want to actually change a setting, you must SSH in, edit the file, save it, and restart the server.

This is fine for small websites with a few visitors, or sites with almost no custom settings, but for large sites it can be a problem. Bringing down the site however briefly could interrupt users, generate errors, causing worries during a checkout process or when editing a post on a site. Doing anything more complex and preventing downtime really means rolling your own thing.

So we rolled our own thing. Review Board has operated with the standard Django settings module since day 1, but it’s become obvious over time that it wasn’t good enough for us. So we wrote the dynamic Site Configuration app for Djblets.

djblets.siteconfig

siteconfig is a relatively small Django app that stores setting data in the database, along with project versioning data in case you need it to handle migrations of some sort down the road. There’s a lot of useful things in this app, so I’ll summarize what it provides before I jump into details:

  • Saving/Retrieving settings.

    Setting values can be simple strings, integers, booleans, or something more complex like an array of dictionaries of key/value pairs.

  • Default values for settings.

    These default values could be provided by your application directory or could be based off the contents in your settings.py file.

  • Syncing to/from settings.py.

    Just because you’re using djblets.siteconfig doesn’t mean you can’t still support settings.py. Settings can be pulled from your old settings and saved back there when modified.

  • Compatibility with standard Django settings.

    Django offers a lot of useful settings that you may want to customize. We do the hard work of mapping these and keeping them in sync so you can customize them dynamically.

  • Auto-generated settings pages.

    Much like the existing admin UI, we offer an easy way to provide settings pages for your application. Go ahead and stick these in the admin UI if you like.

Getting Started

To start out, you’ll want to add some code in your app that creates a SiteConfiguration model, perhaps as a post_syncdb management hook. This is project-specific, since you’ll be storing version information in here and linking it with your existing Site. Here’s one way to do it. We’ll use this file as a template in later examples:

# myapp/siteconfig.py
# NOTE: Import this file in your urls.py or some place before
#       any code relying on settings is imported.
from django.contrib.sites.models import Site

from djblets.siteconfig.models import SiteConfiguration


def load_site_config():
    """Sets up the SiteConfiguration, provides defaults and syncs settings."""
    try:
        siteconfig = SiteConfiguration.objects.get_current()
    except SiteConfiguration.DoesNotExist:
        # Either warn or just create the thing. Depends on your app
        siteconfig = SiteConfiguration(site=Site.objects.get_current(),
                                       version="1.0")

    # Code will go here for settings work in later examples.


load_site_config()

Saving/Retrieving Settings

Settings exist in a SiteConfiguration model inside a JSONField. Anything that Django’s native JSON serializer/deserializer code can handle, you can store. Usually you’ll want to store primitive values (strings, booleans, integers, etc.), but you’re free to do something more complex.

from djblets.siteconfig.models import SiteConfiguration


siteconfig = SiteConfiguration.objects.get_current()
siteconfig.set("mystring", "foobar")
siteconfig.set("mybool", True)
siteconfig.set("myarray", [1, 2, 3, 4])

mybool = siteconfig.get("mybool")

It’s pretty straightforward. The set/get functions are just simple accessors for the SiteConfiguration.settings JSON field, but you’ll want to use them because they’ll handle the registered defaults for settings, which I will get to in a moment.

Since this is just a Djblets JSONField, which is a glorified dictionary, you can do a lot of things, such as iterate through the keys and values:

for key, value in siteconfig.settings:
    print "%s: %s" % (key, value)

And so on. Let’s go a step further.

Default Values

Say you’re starting fresh and just created your SiteConfiguration instance, or you’ve introduced a new setting into an existing site. The setting won’t be in the saved settings, so what value is returned when you do a get call? Well, that depends on your setup a bit.

In the first case, with a fresh new setting:

>>> print siteconfig.get("mynewsetting")
None

Any setting without a default value and not in the saved settings will return None. Good ol’ reliable None.. If you want to make sure you return something sane at this specific callpoint, you can specify a default value in the call to get, like so:

>>> print siteconfig.get("mynewsetting", default=123)
123

This can be really useful at times, but it’s a pain to have to do this in every call and keep the values in sync. So we provide a third option: Registered default values.

# myapp/siteconfig.py


defaults = {
    'mystring':     'Foobar',
    'mybool':       False,
    'mynewsetting': 123,
}


def load_site_config():
    ...

    if not siteconfig.get_defaults():
        siteconfig.add_defaults(defaults)

That’s all it takes. Now calling get with just the setting name will return the default value, if the setting hasn’t been saved in the database.

Down the road we’re going to have support for automatic querying of apps to grab their settings, giving third party apps an easier way to integrate into codebases using siteconfig.

Syncing Settings with settings.py

Django’s settings.py is and will continue to be an important place for some settings to go. If you’re developing a reusable application for other projects, you may not want to fully ditch settings.py. Or maybe you’re using a third party application that uses settings.py and want to make the settings dynamic.

siteconfig was designed from the beginning to handle syncing settings in both places. Now, when I say syncing, I don’t mean we write out to settings.py, since that would require enforcing certain permissions on the file. What we do instead is to load in the values from settings.py if not set in the siteconfig settings, and to write out to the settings object, an in-memory version of settings.py.

Let’s revisit our example above, but in this case we want to be able to dynamically control Django’s EMAIL_HOST setting.

# myapp/siteconfig.py
from django.conf import settings

from djblets.siteconfig.django_settings import apply_django_settings,
                                               generate_defaults


settings_map = {
    # siteconfig key    settings.py key
    'mail_host':        'EMAIL_HOST',
}

defaults = {
    ...
}
defaults.update(generate_defaults(settings_map))


def load_site_config():
    ...

    apply_django_settings(siteconfig, settings_map)

What we did here was to generate a mapping table of our custom settings to Django settings.py settings. We can then generate a set of defaults using the generate_defaults function. This goes through the Django settings keys, pulls out the values if set in settings.py, and returns a dictionary similar to the one we wrote above. This guarantees that our default values will be what you have configured in settings.py, handling the first part of our synchronization.

The second part is handled through use of the apply_django_settings function. This will take a siteconfig and a settings map and write out all values that have actually changed. So if “mail_host” above is never modified, apply_django_settings won’t write it out to the database, allowing the default value to change without having to do anything too fancy.

Compatibility with Django’s Settings

The above example dealt with the settings.EMAIL_HOST setting, but in reality we won’t have to cover this at all. There’s a few other goodies in the django_settings module that handle all the Django settings for you.

First off is get_django_settings_map. You’ll rarely need to call this directly, but it returns a settings map for all the Django settings sites are likely to want to change.

Then there’s get_django_defaults, which you’ll want to either merge into your existing defaults table or add directly. We don’t do it for you because you may very well not care about these settings.

The third is one you just saw, apply_django_settings. In the above example, we passed in our settings map, but if you don’t specify one it will use get_django_settings_map.

Let’s take a look at how this works.

# myapp/siteconfig.py

from djblets.siteconfig.django_settings import apply_django_settings,
                                               get_django_defaults


def load_site_config():
    ...

    if not siteconfig.get_defaults():
        siteconfig.add_defaults(defaults)
        siteconfig.add_defaults(get_django_defaults())

    apply_django_settings(siteconfig, settings_map)
    apply_django_settings(siteconfig)

This is all it takes.

If you take a look at django_settings.py, you’ll notice that we don’t reuse the existing Django settings names and instead provide our own. This is an attempt to cleanly namespace the settings used. You’ll also notice that there’s several settings maps and defaults functions. This is so you can be more picky if you really want to.

Auto-generated Settings Pages

Oh, this is where it gets fun.

Dynamic settings are all well and good, but if you can’t set them through your browser, why bother?

We wanted to make sure that not only could these be modified through a browser, but that it was dead simple to put up pages for changing settings. All you have to do is add a URL mapping and one form per page.

The form doesn’t even need much more than fields.

Let’s start out with some code. It should be pretty self-explanatory.

# urls.py
from myapp.forms import GeneralSettingsForm

urlpatterns = patterns('',
    (r'^admin/settings/general/$', 'djblets.siteconfig.views.site_settings',
     {'form_class': GeneralSettingsForm}),
    (r'^admin/(.*)', admin.site.root),
)
# myapp/forms.py
from django import forms
from djblets.siteconfig.forms import SiteSettingsForm


class GeneralSettingsForm(SiteSettingsForm):
    mystring = forms.CharField(
        label="My String",
        required=True)

    mybool = forms.BooleanField(
        label="My Boolean",
        required=False)

    mail_host = forms.CharField(
        label="Mail Server",
        required=False)


    class Meta:
        title = "General Settings"

That’s it. Now just go to /admin/settings/general/ and you’ll see your brand new settings form. It will auto-load existing or default settings and save modified settings. Like any standard form, it will handle help text, field validation and error generation.

Example Settings Page

Proxy Fields

For a lot of sites, this will be sufficient. For more complicated settings, we have a few more things we can do.

Let’s take the array above. Django’s forms doesn’t have any concept of array values, but we can fake it with custom load and save methods and a proxy field.

# myapp/forms.py
import re

class GeneralSettingsForm(SiteSettingsForm):
    ...

    my_array = forms.CharField(
        label="My Array",
        required=False,
        help_text="A comma-separated list of anything!")

    def load(self):
        self.fields['my_array'].initial =
            ', '.join(self.siteconfig.get('myarray'))

        super(GeneralSettingsForm, self).load()

    def save(self):
        self.siteconfig.set('myarray',
            re.split(r',s*', self.cleaned_data['my_array']))

        super(GeneralSettingsForm, self).save()

    class Meta:
        save_blacklist = ('my_array',)

What we’re essentially doing here is to come up with a new field not in the settings database (notice “my_array” versus “myarray”) and to use this as a proxy for the real setting. We then handle the serialization/deserialization in the save and load methods.

When doing this, it’s very important to add the proxy field to the save_blacklist tuple in the Meta class so that the form doesn’t save your proxy field in the database!

Fieldsets

Much like Django’s own administration UI, the SiteSettingsForm allows for placing fields into custom fieldsets, allowing settings to be grouped together under a title and optional description. This is done by filling out the fieldsets variable in the Meta class, like so:

class GeneralSettingsForm(SiteSettingsForm):
    class Meta:
        fieldsets = (
            {
                'title':   "General",
                'classes': ('wide',),
                'fields':  ('mystring', 'mybool', 'my_array',),
            },
            {
                'title':       "E-Mail",
                'description': "Some description of e-mail settings.",
                'classes':     ('wide',),
                'fields':      ('mail_host',),
            },
        )

This will result in two groups of settings, each with a title, and one with a description. The description can span multiple paragraphs by inserting newlines (“n“). All fields in the dictionary except for 'fields' are optional.

Re-applying Django settings

When you modify a field that corresponds to a setting in settings.py, you want to make sure you re-apply the setting or you could break something. This is pretty simple. Again override save as follows:

from myapp.siteconfig import load_site_config


class GeneralSettingsForm(SiteSettingsForm):
    def save(self):
        super(GeneralSettingsForm, self).save()
        load_site_config()

This will re-synchronize the settings, making sure everything is correct. Note that down the road, this will likely be automatic.

There’s one last thing to show you…

Disabled Fields

Sometimes it’s handy to disable fields. Maybe a particular setting requires a third party module that isn’t installed. The SiteSettingsForm has two special dictionaries, disabled_fields and disabled_reasons, that handle this. Just override your load function again.

class GeneralSettingsForm(SiteSettingsForm):
    def load(self):
        if not myarray_supported():
            self.disabled_fields['my_array'] = True
            self.disabled_reasons['my_array'] =
                "This cannot be modified because we don't support arrays today."

        super(GeneralSettingsForm, self).load()

The resulting fields in the HTML will be disabled, and the error message will be displayed below the field.

Depending on whether or not you have a custom administration UI, you may want to tweak the CSS file or completely override it. Note that this all assumes that the djblets/media/ directory exists in your MEDIA_URL as the “djblets” directory. If you have something custom, you can always override siteconfig/settings.html and pass the new template to the view along with your settings form.

More Coming Soon!

There’s some new stuff in the works for the siteconfig app. Auto-detection of defaults and settings maps is the big one. I also have a few more useful tricks for doing advanced things with settings pages that I’ll demonstrate.

If you’re a user of Djblets, let us know, and feel free to blog about it! I’ll link periodically to other blogs talking about Djblets and projects using Djblets.

And as always, suggestions, constructive criticism and patches are always welcome 🙂

Django Development with Djblets: Dynamic Site Configuration Read More »

Django Development with Djblets: Data Grids

It’s been a while since my last post in this series, but this one’s a good one.

A common task in many web applications is to display a grid of data, such as rows from a database. Think the Inbox from GMail, the model lists from the Django administration interface, or the Dashboard from Review Board. It’s not too hard to write something that displays a grid by doing something like:


{% for user in users %}
 
{% endfor %}
Username First Name Last Name
{{user.username}} {{user.first_name}} {{user.last_name}}

This works fine, so long as you don’t want anything fancy, like sortable columns, reorderable columns, or the ability to let users specify which columns they want to see. This requires something a bit more complex.

djblets.datagrids

We wrote a nifty little set of classes for making data grids easy. Let’s take the above example and convert it over.

# myapp/datagrids.py
from django.contrib.auth.models import User
from djblets.datagrid.grids import Column, DataGrid

class UserDataGrid(DataGrid):
    username = Column("Username", sortable=True)
    first_name = Column("First Name", sortable=True)
    last_name = Column("Last Name", sortable=True)

    def __init__(self, request):
        DataGrid.__init__(self, request, User.objects.filter(is_active=True), "Users")
        self.default_sort = ['username']
        self.default_columns = ['username', 'first_name', 'last_name']
# myapp/views.py
from myapp.datagrids import UserDataGrid

def user_list(request, template_name='myapp/datagrid.html'):
    return UserDataGrid(request).render_to_response(template_name)

Now while this may look a bit more verbose, it offers many benefits in return. First off, users will be able to reorder the columns how they like and choose which columns to see. You could extend the above DataGrid to add some new column but leave it out of default_columns so that only users who really care about it would see it.

Custom columns

Let’s take this a step further. Say we want our users datagrid to optionally show staff members with a special badge. This might be useful to some users, but not all, so we’ll leave it out by default. We can implement this by creating a custom column with image data:

class StaffBadgeColumn(Column):
    def __init__(self, *args, **kwargs):
        Column.__init__(self, *args, **kwargs)

        # These define what will appear in the column list menu.
        self.image_url = settings.MEDIA_URL + "images/staff_badge.png"
        self.image_width = 16
        self.image_height = 16
        self.image_alt = "Staff"

        # Give the entry in the columns list menu a name.
        self.detailed_label = "Staff"

        # Take up as little space as possible.
        self.shrink = True

    def render_data(self, user):
        if user.is_staff:
            return '%s' %
                (self.image_url, self.image_width, self.image_height, self.image_alt)

        return ""

class UserDataGrid(DataGrid):
    ...
    staff_badge = StaffBadgeColumn()
    ...

This will add a new entry to the datagrid’s column customization menu showing the staff badge with a label saying “Staff.” If users enable this column, their datagrid will update to show an staff icon for any users who are marked as staff.

Custom columns can render data in a couple of different ways.

The first is to bind the column to a field in the model. By default, a Column instance added to a DataGrid will use its own name (such as “first_name” above) as a lookup in the object. If you want to use a custom field, set the db_field attribute on the column. This field can span relationships as well. For example:

class UserDataGrid(DataGrid):
    ...
    # Uses the field name as the lookup.
    username = Column("Username")

    # Spans relationships, looking up the age in the profile.
    age = Column("profile__age")

The second is the way shown above, by overriding the render_data function. This takes an object, which is the object the datagrid represents, and outputs HTML. The logic in this can be quite complicated, depending on what’s needed.

Linking to objects

A datagrid is pretty worthless if you can’t link entries for the object to a URL. We have a couple of ways to do this.

To link a column on a datagrid to a URL, set the link parameter on the DataGrid to True. By default, the URL used will be the result of calling get_absolute_url() on the object represented by the datagrid, but you can override this:

# myapp/datagrids.py

class UserDataGrid(DataGrid):
    username = Column("Username", sortable=True, link=True)
    ...
    @staticmethod
    def link_to_object(user, value):
        return "/users/%s/" % user

Then

link_to_object takes an object and a value. The object is the object being represented by the datagrid, and the value is the rendered data in the cell. The result is a valid path or URL. In the above example, we’re explicitly defining a path, but we could use Django’s reverse function.

Usually the value is ignored, but it can be useful. Imagine that your Column represents a field that is a ForeignKey of an object. The contents of the cell is the string version of that object (implemented by the __str__ or __unicode__ function), but it’s just an object and you want to link to it. This is where value comes in handy, as instead of being HTML output, it’s actually an object you can link to.

If you intend to link a particular column to value.get_absolute_url(), we provide a handy utility function called link_to_value which exists as a static method on the DataGrid. You can pass it (or any function) to the Column’s constructor using the link_func parameter.

class UserDataGrid(DataGrid):
    ...
    myobj = Column(link=True, link_func=link_to_value)

Custom columns can hard-code their own link function by explicitly setting these values in the constructor.

Saving column settings

If a user customizes his column settings by adding/removing columns or rearranging them, he probably expects to see his customizations the next time he visits the page. DataGrids can automatically save these settings in the user’s profile, if the application supports it. Handling this is as simple as adding some fields to the app’s Profile model and setting a couple options in the DataGrid. For example:

# myapp/models.py

class Profile(models.Model):
    user = models.ForeignKey(User, unique=True)

    # Sort settings. One per datagrid type.
    sort_user_columns = models.CharField(max_length=256, blank=True)

    # Column settings. One per datagrid type.
    user_columns = models.CharField(max_length=256, blank=True)
# myapp/datagrids.py

class UserDataGrid(DataGrid):
    ...
    def __init__(self, request):
        self.profile_sort_field = "sort_user_columns"
        self.profile_columns_field = "user_columns"

The DataGrid code will handle all the loading and saving of customizations in the Profile. Easy, isn’t it?

Requirements and compatibility

Right now, datagrids require both the Yahoo! UI Library and either yui-ext (which is hard to find now) or its replacement, ExtJS.

Review Board, which Djblets was mainly developed for, still relies on yui-ext instead of ExtJS, so datagrids today by default are built with that in mind. However, the JavaScript portion of datagrids should work with ExtJS today. The default template for rendering the datagrids (located in djblets/datagrids/templates/datagrids/) assumes both YUI and yui-ext are present in the media path, but sites using datagrids are encouraged to provide their own template to reflect the actual locations and libraries used.

We would like to work with more JavaScript libraries, so if anyone wants to provide an implementation for their favorite library or help to dumb down our implementation to work with any and all, we’d welcome patches.

See it in action

We make extensive use of datagrids on Review Board. You can get a sense of it and play around by looking at the users list and the dashboard (which requires an account).

That’s it for now!

There’s more to datagrids than what I’ve covered here, but this should give you a good understanding of how they work. Later on I’d like to do a more in-depth article on how datagrids work and some more advanced use cases for them.

For now, take a look at the source code and Review Board’s datagrids for some real-world examples.

Django Development with Djblets: Data Grids Read More »

Django Tips: PIL, ImageField and Unit Tests

I recently spent a few days trying to track down a bug in Review Board with our unit tests. Basically, unit tests involving uploading a file to a models.ImageField would fail with a validation error specifying that the file wasn’t an image. This worked fine when actually using the application, just not during unit tests.

The following code would cause this problem. (Note that this isn’t actual code, just a demonstration.)

# myapp/models.py
from django.db import models

class MyModel(models.Model):
    image = models.ImageField()
# myapp/forms.py
from django import forms
from myapp.models import MyModel

class UploadImageForm(forms.Form):
    image_path = forms.ImageField(required=True)

    def create(self, file):
        mymodel = MyModel()
        mymodel.save_image_file(file.name, file, save=True)
        return mymodel
# myapp/views.py
from myapp.forms import UploadImageForm

def upload(request):
    form = UploadImageForm(request.POST, request.FILES)

    if not form.is_valid():
        # Return some error
        pass

    # Return success
# myapp/tests.py
from django.test import TestCase

class MyTests(TestCase):
    def testImageUpload(self):
        f = open("testdata/image.png", "r")
        resp = self.client.post("/upload/", {
            'image_path': f,
        })
        f.close()

After a good many hours going through the Django code, expecting there to be a problem with file handles becoming invalidated or the read position not being reset to the beginning of the file, I finally decided to look at PIL, the Python Imaging Library. It turns out that at this point, due to some weirdness with unit tests, PIL hadn’t populated its known list of file formats. All it knew was BMP, and so my perfectly fine PNG file didn’t work.

The solution? Be sure to import PIL and call init() before the unit tests begin. If you have a custom runner (as we do), then putting it in here is appropriate. Just add the following:

from PIL import Image
Image.init()

Hopefully this helps someone else with the problems I’ve dealt with the past few days.

Django Tips: PIL, ImageField and Unit Tests Read More »

Scroll to Top