Improving browser performance in Review Board

This past Sunday, I landed a set of changes into Review Board that provide improved performance, such as aggressive browser-side caching of media and pages. It’s just a start, but has already significantly reduced page load times in all of my tests, in some case by several seconds. We implemented these methods for Review Board, but they’re methods that can be applied to any Django project out there.

There are several key things that Review Board now does to improve performance:

  • Tells browsers to cache all media for one year.
  • Only sends page data if new data is available.
  • Compresses all media files to reduce transfer time.
  • Parallelizes media downloads.
  • Loads CSS files as early as possible.
  • Loads JavaScript files as late as possible.
  • Progressively loads expensive data.

A lot of the performance improvements come straight from Yahoo!’s Best Practices for Speeding Up Your Site. We’re not doing everything there yet, but we’re working toward it. That’s a great resource, by the way, and I recommend that everyone who has even made a website before go and read it.

So what do the above techniques buy us, and how are we doing them? Let me go into more details…

Caching all media for a year

The average site has one or more CSS files, JavaScript files, and several images. This translates to a lot of requests to the server, which may leave the site feeling slow. On top of this, a browser only makes a few requests to a server at a time, in order to avoid swamping the server, which will further hinder load times. This happens every time a user visits a page on your site.

Aggressive caching makes a huge difference and can greatly reduce load times for users. Review Board now tells the browser to cache media files for a year. Once a user downloads a JavaScript or CSS file, they won’t have to download it again, meaning that in general the only requests the browser needs to make is for the page requests and AJAX requests.

The big pitfall with long-term caching is that the cached resources can go stale. For example, if a new version of an image was uploaded, the browser wouldn’t even know about it, since it was told it should keep its old version for a year before checking again.

We solve this by introducing “media serials,” timestamps that are appended to all media paths. Instead of caching /js/myscript.js, the browser would cache /js/myscript.js?1273618736.

These media serials are computed on the first page request by our djblets.util.context_processors.ajaxSerial context processor. This quickly scans all media files known to the program, finding out the latest modification timestamp. It then provides a {{MEDIA_SERIAL}} variable for templates to append to media URLs as part of the query string.

The benefit to this method is that we can cache media files for a year and not worry about users having stale cached resources the next time we upgrade a copy of Review Board. The filenames requested will be different, browsers will see that the new file is not in the cache, and make a request, caching the new file for a year.

Only send page data if new data is available

Aggressive caching of media files is great and saves a lot of time, but it doesn’t help for dynamically generated content. For this, we need a new strategy.

When a browser makes a request, it can send a If-Modified-Since header to the server containing the Last-Modified value it received the last time it downloaded that page. This is a very valuable header, and there’s some things we can do with it to save both the server and the browser a lot of trouble.

If the browser sends If-Modified-Since, and we know that no new data has been generated since the timestamp provided, we can send an HttpResponseNotModified (HTTP response code 304). This will tell the browser it already has the newest version of the page. The sooner we do this, the better, as it means we don’t have to waste time building templates or doing any expensive database queries.

Djblets, once again, provides some functions to help out here: djblets.util.http.set_last_modified and djblets.util.http.get_modified_since.

The general usage pattern is that we first build a timestamp representing the latest version of the page. This could be the timestamp for a particular object that the page represents. We then check if we can bail early by calling:

if get_modified_since(request, timestamp):
    return HttpResponseNotModified()

Further down, after building the page, we must set the Last-Modified timestamp, using the same timestamp as above, like so:

set_last_modified(response, timestamp)

We’re using this in only a few places right now, such as the review request details page, but it drastically improves load times. If the review request hasn’t changed and nobody’s commented on it since the browser last loaded the page, a reload of the page will be almost instant.

Compress all media files

Our Apache and lighttpd config files now enable compression by default. By compressing these files, we can turn a relatively large JavaScript file (such as the jquery and jquery-ui files) into a very small file before sending it over to the browser. This reduces transfer times at the expense of compression/decompression time (which is small enough to not worry for deployments of this size, and can be offset by caching of compressed files server-side).

Parallelize media downloads

It’s important to not mix loads of media files of different types. The browser parallelizes media downloads of the same type, in page load order, but if you load one CSS file, one JavaScript file, another CSS file, and then another JavaScript file, the browser will only attempt one load at a time. If you load all the CSS files before all JavaScript files, it will parallelize the CSS file download and then the JavaScript downloads. By enforcing the separation of loads, we can achieve faster page download/render times.

Load CSS files as soon as possible

Loading CSS files before the browser starts to display the page can make the page appear to load smoother. The browser will already know how things should look and will lay the page out accordingly, instead of laying the page out once and then updating that once the CSS files have loaded.

Load JavaScript files as late as possible

JavaScript loads block the browser, as the browser must parse and interpet the JavaScript before it can continue. Sometimes it’s necessary to load a JavaScript file early, but in many cases the files can be loaded late. When possible, we load JavaScript files at the very end of the document body so that they won’t even begin downloading until the page has rendered. This provides noticeable performance for script-heavy pages.

Progressively load expensive data

There are types of data that are just too expensive to load along with the rest of the page. For a long time, Review Board would parse and render fragments of a diff for display in the review request page, but that meant that before the page could load, Review Board would need to do the following:

  1. Query the list of all comments.
  2. Fetch every file commented on.
  3. Apply the stored patch to each file.
  4. Diff between the original and patched files.
  5. Render the portion of the diff commented on into the page.

This became very time-consuming, and if a server was down, the page wasn’t available until everything timed out. The solution to this was to lazily load each of these diff fragments in order.

We now display a placeholder table for each diff fragment in roughly the same size of the rendered fragment (to avoid excessive page scrolling on loads). The table contains a spinner showing that something is happening, and, one-by-one (to avoid dogpiling) we load each diff fragment.

The code to render the diff fragment, by the way, takes advantage of the If-Modified-Since header and is also cached for a year. We use an AJAX_SERIAL (same principal as the MEDIA_SERIAL above) to allow for changes in new deployments.

With these caching mechanisms in place, the review request page now loads in roughly a second in many cases (or less once cached), with diff fragments coming in lazily (and then almost immediately on future loads).

More to come…

This was a great first step, but there’s more we can do. Before we hit our 1.0 release, we’re going to batch together all our CSS files and JavaScript files into a couple of combined files and then “minify” them (basically compressing them in such a way to allow for smaller files and faster load times of the interpreted data).

Again, these are techniques we’re now making use of in Review Board, but they’re not in any way specific to Review Board. Anyone out there developing websites or web applications should seriously look into ways to improve performance. I hope this was a good starting point, but seriously, do read Yahoo!’s article as well.

Djblets and Review Board moving to jQuery

When we first began development of Review Board and its Django utility package Djblets almost two years ago, we needed to pick a JavaScript library to use. There were several good ones available at the time, and after evaluating several options we chose the Yahoo! UI Library (YUI) and YUI-Ext, an extension library to YUI. Both were excellent and have served us well over the past two years.

However, YUI-Ext itself is really no more, which means we needed to choose something new. We could have continued to bundle it and YUI with Review Board, but users of Djblets would have to hunt down a copy and load in all of YUI and YUI-Ext just to use such features as datagrids. This was, to say the least, inconvenient.

YUI-Ext and the Licensing Situation

Now I said YUI-Ext is no more, but really it’s still around, just in a new form. It had been renamed to ExtJS and became its own independent library. While compatible with YUI, most of the functionality we needed was really provided by ExtJS, meaning we didn’t really need both. Over time, YUI evolved as well, so we began looking into the differences and figuring out which to go with.

ExtJS, it turns out, was a non-starter for us. Unlike YUI-Ext before it, the ExtJS licensing terms were a bit restrictive and, frankly, confusing. Open source projects could use it under the GPL3, but commercial products required a special license per developer, which is $289 per developer. The GPL3 itself is unclear with regards to our situation. We’re an open source project, but we’re MIT-licensed. What if a company wants to modify something for their own use and not release it to the public? What if down the road someone wants to offer commercial extensions to Review Board? We’d have to choose the commercial license to be safe, but then contributors would also need to pay $289 for a license, wouldn’t they?

Looking Again at YUI

We didn’t want to step down a dangerous road with ExtJS, so we started to look again at YUI. It’s definitely come a long way and I must recommend it for developers looking for a strong toolkit with good UI support. They provide good documentation, many examples, and base functionality for nearly everything.

However, there’s a lot we’d have to rewrite to move to it, and if we had to rewrite the code anyway, I wanted to look around a bit at the current generation of JavaScript toolkits. Especially since our needs have changed over the years and we’re looking for something with a lighter footprint. We’re also moving away from the dialog-centered UI we have on some pages, which is where YUI shines.

jQuery

We ended up deciding to go with jQuery, partially because of the footprint and partially the clean way in which scripts can be written. As an experiment, I converted our Djblets datagrid code to jQuery. The result is a smaller, much more readable, reusable, cross-browser script. I was impressed by what jQuery let me do, and by the size.

I did some comparisons on the number of files downloaded and the size of the files, using the Review Board dashboard as a test.

With YUI and YUI-Ext, we were loading 7 JavaScript files, excluding our own scripts, at a total size of 376KB (when minified).

With jQuery, we loaded only 2 JavaScript files, at a total size of 128KB.

Our datagrid.js file also went down from 13KB to 9KB largely due to some of the niceties of jQuery’s API.

This is a pretty significant savings, a whole 252KB, which could be a couple of seconds on an average DSL line. And it’s not just the file size but the number of requests, which can make a big difference on a heavily accessed web server.

What This Means for Djblets

Developers using Djblets can now use datagrids without needing to do anything special. Djblets bundles jQuery, which it will use by default unless you choose to override which jQuery script is being used. This means all you need is an install of Djblets set up and you’re ready to use it!

Going Forward

We’re working on migrating the rest of Review Board to jQuery. This will take place over the next few weeks, and is one of the last major things we hope to do for our 1.0 release.

Django Development with Djblets: Dynamic Site Configuration

Django’s a powerful toolkit and offers many things to ease the creation of websites and web applications. Features such as the automatically built Administration UI are just awesome and save developers from having to reinvent the wheel on every project. It does fall short in a few areas, however, and one of them is site configuration.

Now, Django’s settings support isn’t bad. You get a nice little settings.py file to put settings in for your site/webapp, and since it’s just a Python module, your settings can really hold whatever data you want and can be well documented with comments. That’s all great, but when you want to actually change a setting, you must SSH in, edit the file, save it, and restart the server.

This is fine for small websites with a few visitors, or sites with almost no custom settings, but for large sites it can be a problem. Bringing down the site however briefly could interrupt users, generate errors, causing worries during a checkout process or when editing a post on a site. Doing anything more complex and preventing downtime really means rolling your own thing.

So we rolled our own thing. Review Board has operated with the standard Django settings module since day 1, but it’s become obvious over time that it wasn’t good enough for us. So we wrote the dynamic Site Configuration app for Djblets.

djblets.siteconfig

siteconfig is a relatively small Django app that stores setting data in the database, along with project versioning data in case you need it to handle migrations of some sort down the road. There’s a lot of useful things in this app, so I’ll summarize what it provides before I jump into details:

  • Saving/Retrieving settings.

    Setting values can be simple strings, integers, booleans, or something more complex like an array of dictionaries of key/value pairs.

  • Default values for settings.

    These default values could be provided by your application directory or could be based off the contents in your settings.py file.

  • Syncing to/from settings.py.

    Just because you’re using djblets.siteconfig doesn’t mean you can’t still support settings.py. Settings can be pulled from your old settings and saved back there when modified.

  • Compatibility with standard Django settings.

    Django offers a lot of useful settings that you may want to customize. We do the hard work of mapping these and keeping them in sync so you can customize them dynamically.

  • Auto-generated settings pages.

    Much like the existing admin UI, we offer an easy way to provide settings pages for your application. Go ahead and stick these in the admin UI if you like.

Getting Started

To start out, you’ll want to add some code in your app that creates a SiteConfiguration model, perhaps as a post_syncdb management hook. This is project-specific, since you’ll be storing version information in here and linking it with your existing Site. Here’s one way to do it. We’ll use this file as a template in later examples:

# myapp/siteconfig.py
# NOTE: Import this file in your urls.py or some place before
#       any code relying on settings is imported.
from django.contrib.sites.models import Site

from djblets.siteconfig.models import SiteConfiguration


def load_site_config():
    """Sets up the SiteConfiguration, provides defaults and syncs settings."""
    try:
        siteconfig = SiteConfiguration.objects.get_current()
    except SiteConfiguration.DoesNotExist:
        # Either warn or just create the thing. Depends on your app
        siteconfig = SiteConfiguration(site=Site.objects.get_current(),
                                       version="1.0")

    # Code will go here for settings work in later examples.


load_site_config()

Saving/Retrieving Settings

Settings exist in a SiteConfiguration model inside a JSONField. Anything that Django’s native JSON serializer/deserializer code can handle, you can store. Usually you’ll want to store primitive values (strings, booleans, integers, etc.), but you’re free to do something more complex.

from djblets.siteconfig.models import SiteConfiguration


siteconfig = SiteConfiguration.objects.get_current()
siteconfig.set("mystring", "foobar")
siteconfig.set("mybool", True)
siteconfig.set("myarray", [1, 2, 3, 4])

mybool = siteconfig.get("mybool")

It’s pretty straightforward. The set/get functions are just simple accessors for the SiteConfiguration.settings JSON field, but you’ll want to use them because they’ll handle the registered defaults for settings, which I will get to in a moment.

Since this is just a Djblets JSONField, which is a glorified dictionary, you can do a lot of things, such as iterate through the keys and values:

for key, value in siteconfig.settings:
    print "%s: %s" % (key, value)

And so on. Let’s go a step further.

Default Values

Say you’re starting fresh and just created your SiteConfiguration instance, or you’ve introduced a new setting into an existing site. The setting won’t be in the saved settings, so what value is returned when you do a get call? Well, that depends on your setup a bit.

In the first case, with a fresh new setting:

>>> print siteconfig.get("mynewsetting")
None

Any setting without a default value and not in the saved settings will return None. Good ol’ reliable None.. If you want to make sure you return something sane at this specific callpoint, you can specify a default value in the call to get, like so:

>>> print siteconfig.get("mynewsetting", default=123)
123

This can be really useful at times, but it’s a pain to have to do this in every call and keep the values in sync. So we provide a third option: Registered default values.

# myapp/siteconfig.py


defaults = {
    'mystring':     'Foobar',
    'mybool':       False,
    'mynewsetting': 123,
}


def load_site_config():
    ...

    if not siteconfig.get_defaults():
        siteconfig.add_defaults(defaults)

That’s all it takes. Now calling get with just the setting name will return the default value, if the setting hasn’t been saved in the database.

Down the road we’re going to have support for automatic querying of apps to grab their settings, giving third party apps an easier way to integrate into codebases using siteconfig.

Syncing Settings with settings.py

Django’s settings.py is and will continue to be an important place for some settings to go. If you’re developing a reusable application for other projects, you may not want to fully ditch settings.py. Or maybe you’re using a third party application that uses settings.py and want to make the settings dynamic.

siteconfig was designed from the beginning to handle syncing settings in both places. Now, when I say syncing, I don’t mean we write out to settings.py, since that would require enforcing certain permissions on the file. What we do instead is to load in the values from settings.py if not set in the siteconfig settings, and to write out to the settings object, an in-memory version of settings.py.

Let’s revisit our example above, but in this case we want to be able to dynamically control Django’s EMAIL_HOST setting.

# myapp/siteconfig.py
from django.conf import settings

from djblets.siteconfig.django_settings import apply_django_settings, 
                                               generate_defaults


settings_map = {
    # siteconfig key    settings.py key
    'mail_host':        'EMAIL_HOST',
}

defaults = {
    ...
}
defaults.update(generate_defaults(settings_map))


def load_site_config():
    ...

    apply_django_settings(siteconfig, settings_map)

What we did here was to generate a mapping table of our custom settings to Django settings.py settings. We can then generate a set of defaults using the generate_defaults function. This goes through the Django settings keys, pulls out the values if set in settings.py, and returns a dictionary similar to the one we wrote above. This guarantees that our default values will be what you have configured in settings.py, handling the first part of our synchronization.

The second part is handled through use of the apply_django_settings function. This will take a siteconfig and a settings map and write out all values that have actually changed. So if “mail_host” above is never modified, apply_django_settings won’t write it out to the database, allowing the default value to change without having to do anything too fancy.

Compatibility with Django’s Settings

The above example dealt with the settings.EMAIL_HOST setting, but in reality we won’t have to cover this at all. There’s a few other goodies in the django_settings module that handle all the Django settings for you.

First off is get_django_settings_map. You’ll rarely need to call this directly, but it returns a settings map for all the Django settings sites are likely to want to change.

Then there’s get_django_defaults, which you’ll want to either merge into your existing defaults table or add directly. We don’t do it for you because you may very well not care about these settings.

The third is one you just saw, apply_django_settings. In the above example, we passed in our settings map, but if you don’t specify one it will use get_django_settings_map.

Let’s take a look at how this works.

# myapp/siteconfig.py

from djblets.siteconfig.django_settings import apply_django_settings, 
                                               get_django_defaults


def load_site_config():
    ...

    if not siteconfig.get_defaults():
        siteconfig.add_defaults(defaults)
        siteconfig.add_defaults(get_django_defaults())

    apply_django_settings(siteconfig, settings_map)
    apply_django_settings(siteconfig)

This is all it takes.

If you take a look at django_settings.py, you’ll notice that we don’t reuse the existing Django settings names and instead provide our own. This is an attempt to cleanly namespace the settings used. You’ll also notice that there’s several settings maps and defaults functions. This is so you can be more picky if you really want to.

Auto-generated Settings Pages

Oh, this is where it gets fun.

Dynamic settings are all well and good, but if you can’t set them through your browser, why bother?

We wanted to make sure that not only could these be modified through a browser, but that it was dead simple to put up pages for changing settings. All you have to do is add a URL mapping and one form per page.

The form doesn’t even need much more than fields.

Let’s start out with some code. It should be pretty self-explanatory.

# urls.py
from myapp.forms import GeneralSettingsForm

urlpatterns = patterns('',
    (r'^admin/settings/general/$', 'djblets.siteconfig.views.site_settings',
     {'form_class': GeneralSettingsForm}),
    (r'^admin/(.*)', admin.site.root),
)
# myapp/forms.py
from django import forms
from djblets.siteconfig.forms import SiteSettingsForm


class GeneralSettingsForm(SiteSettingsForm):
    mystring = forms.CharField(
        label="My String",
        required=True)

    mybool = forms.BooleanField(
        label="My Boolean",
        required=False)

    mail_host = forms.CharField(
        label="Mail Server",
        required=False)


    class Meta:
        title = "General Settings"

That’s it. Now just go to /admin/settings/general/ and you’ll see your brand new settings form. It will auto-load existing or default settings and save modified settings. Like any standard form, it will handle help text, field validation and error generation.

Example Settings Page

Proxy Fields

For a lot of sites, this will be sufficient. For more complicated settings, we have a few more things we can do.

Let’s take the array above. Django’s forms doesn’t have any concept of array values, but we can fake it with custom load and save methods and a proxy field.

# myapp/forms.py
import re

class GeneralSettingsForm(SiteSettingsForm):
    ...

    my_array = forms.CharField(
        label="My Array",
        required=False,
        help_text="A comma-separated list of anything!")

    def load(self):
        self.fields['my_array'].initial = 
            ', '.join(self.siteconfig.get('myarray'))

        super(GeneralSettingsForm, self).load()

    def save(self):
        self.siteconfig.set('myarray',
            re.split(r',s*', self.cleaned_data['my_array']))

        super(GeneralSettingsForm, self).save()

    class Meta:
        save_blacklist = ('my_array',)

What we’re essentially doing here is to come up with a new field not in the settings database (notice “my_array” versus “myarray”) and to use this as a proxy for the real setting. We then handle the serialization/deserialization in the save and load methods.

When doing this, it’s very important to add the proxy field to the save_blacklist tuple in the Meta class so that the form doesn’t save your proxy field in the database!

Fieldsets

Much like Django’s own administration UI, the SiteSettingsForm allows for placing fields into custom fieldsets, allowing settings to be grouped together under a title and optional description. This is done by filling out the fieldsets variable in the Meta class, like so:

class GeneralSettingsForm(SiteSettingsForm):
    class Meta:
        fieldsets = (
            {
                'title':   "General",
                'classes': ('wide',),
                'fields':  ('mystring', 'mybool', 'my_array',),
            },
            {
                'title':       "E-Mail",
                'description': "Some description of e-mail settings.",
                'classes':     ('wide',),
                'fields':      ('mail_host',),
            },
        )

This will result in two groups of settings, each with a title, and one with a description. The description can span multiple paragraphs by inserting newlines (“n“). All fields in the dictionary except for 'fields' are optional.

Re-applying Django settings

When you modify a field that corresponds to a setting in settings.py, you want to make sure you re-apply the setting or you could break something. This is pretty simple. Again override save as follows:

from myapp.siteconfig import load_site_config


class GeneralSettingsForm(SiteSettingsForm):
    def save(self):
        super(GeneralSettingsForm, self).save()
        load_site_config()

This will re-synchronize the settings, making sure everything is correct. Note that down the road, this will likely be automatic.

There’s one last thing to show you…

Disabled Fields

Sometimes it’s handy to disable fields. Maybe a particular setting requires a third party module that isn’t installed. The SiteSettingsForm has two special dictionaries, disabled_fields and disabled_reasons, that handle this. Just override your load function again.

class GeneralSettingsForm(SiteSettingsForm):
    def load(self):
        if not myarray_supported():
            self.disabled_fields['my_array'] = True
            self.disabled_reasons['my_array'] = 
                "This cannot be modified because we don't support arrays today."

        super(GeneralSettingsForm, self).load()

The resulting fields in the HTML will be disabled, and the error message will be displayed below the field.

Depending on whether or not you have a custom administration UI, you may want to tweak the CSS file or completely override it. Note that this all assumes that the djblets/media/ directory exists in your MEDIA_URL as the “djblets” directory. If you have something custom, you can always override siteconfig/settings.html and pass the new template to the view along with your settings form.

More Coming Soon!

There’s some new stuff in the works for the siteconfig app. Auto-detection of defaults and settings maps is the big one. I also have a few more useful tricks for doing advanced things with settings pages that I’ll demonstrate.

If you’re a user of Djblets, let us know, and feel free to blog about it! I’ll link periodically to other blogs talking about Djblets and projects using Djblets.

And as always, suggestions, constructive criticism and patches are always welcome 🙂

Django Development with Djblets: Data Grids

It’s been a while since my last post in this series, but this one’s a good one.

A common task in many web applications is to display a grid of data, such as rows from a database. Think the Inbox from GMail, the model lists from the Django administration interface, or the Dashboard from Review Board. It’s not too hard to write something that displays a grid by doing something like:


{% for user in users %}
 
{% endfor %}
Username First Name Last Name
{{user.username}} {{user.first_name}} {{user.last_name}}

This works fine, so long as you don’t want anything fancy, like sortable columns, reorderable columns, or the ability to let users specify which columns they want to see. This requires something a bit more complex.

djblets.datagrids

We wrote a nifty little set of classes for making data grids easy. Let’s take the above example and convert it over.

# myapp/datagrids.py
from django.contrib.auth.models import User
from djblets.datagrid.grids import Column, DataGrid

class UserDataGrid(DataGrid):
    username = Column("Username", sortable=True)
    first_name = Column("First Name", sortable=True)
    last_name = Column("Last Name", sortable=True)

    def __init__(self, request):
        DataGrid.__init__(self, request, User.objects.filter(is_active=True), "Users")
        self.default_sort = ['username']
        self.default_columns = ['username', 'first_name', 'last_name']
# myapp/views.py
from myapp.datagrids import UserDataGrid

def user_list(request, template_name='myapp/datagrid.html'):
    return UserDataGrid(request).render_to_response(template_name)

Now while this may look a bit more verbose, it offers many benefits in return. First off, users will be able to reorder the columns how they like and choose which columns to see. You could extend the above DataGrid to add some new column but leave it out of default_columns so that only users who really care about it would see it.

Custom columns

Let’s take this a step further. Say we want our users datagrid to optionally show staff members with a special badge. This might be useful to some users, but not all, so we’ll leave it out by default. We can implement this by creating a custom column with image data:

class StaffBadgeColumn(Column):
    def __init__(self, *args, **kwargs):
        Column.__init__(self, *args, **kwargs)

        # These define what will appear in the column list menu.
        self.image_url = settings.MEDIA_URL + "images/staff_badge.png"
        self.image_width = 16
        self.image_height = 16
        self.image_alt = "Staff"

        # Give the entry in the columns list menu a name.
        self.detailed_label = "Staff"

        # Take up as little space as possible.
        self.shrink = True

    def render_data(self, user):
        if user.is_staff:
            return '%s' % 
                (self.image_url, self.image_width, self.image_height, self.image_alt)

        return ""

class UserDataGrid(DataGrid):
    ...
    staff_badge = StaffBadgeColumn()
    ...

This will add a new entry to the datagrid’s column customization menu showing the staff badge with a label saying “Staff.” If users enable this column, their datagrid will update to show an staff icon for any users who are marked as staff.

Custom columns can render data in a couple of different ways.

The first is to bind the column to a field in the model. By default, a Column instance added to a DataGrid will use its own name (such as “first_name” above) as a lookup in the object. If you want to use a custom field, set the db_field attribute on the column. This field can span relationships as well. For example:

class UserDataGrid(DataGrid):
    ...
    # Uses the field name as the lookup.
    username = Column("Username")

    # Spans relationships, looking up the age in the profile.
    age = Column("profile__age")

The second is the way shown above, by overriding the render_data function. This takes an object, which is the object the datagrid represents, and outputs HTML. The logic in this can be quite complicated, depending on what’s needed.

Linking to objects

A datagrid is pretty worthless if you can’t link entries for the object to a URL. We have a couple of ways to do this.

To link a column on a datagrid to a URL, set the link parameter on the DataGrid to True. By default, the URL used will be the result of calling get_absolute_url() on the object represented by the datagrid, but you can override this:

# myapp/datagrids.py

class UserDataGrid(DataGrid):
    username = Column("Username", sortable=True, link=True)
    ...
    @staticmethod
    def link_to_object(user, value):
        return "/users/%s/" % user

Then

link_to_object takes an object and a value. The object is the object being represented by the datagrid, and the value is the rendered data in the cell. The result is a valid path or URL. In the above example, we’re explicitly defining a path, but we could use Django’s reverse function.

Usually the value is ignored, but it can be useful. Imagine that your Column represents a field that is a ForeignKey of an object. The contents of the cell is the string version of that object (implemented by the __str__ or __unicode__ function), but it’s just an object and you want to link to it. This is where value comes in handy, as instead of being HTML output, it’s actually an object you can link to.

If you intend to link a particular column to value.get_absolute_url(), we provide a handy utility function called link_to_value which exists as a static method on the DataGrid. You can pass it (or any function) to the Column’s constructor using the link_func parameter.

class UserDataGrid(DataGrid):
    ...
    myobj = Column(link=True, link_func=link_to_value)

Custom columns can hard-code their own link function by explicitly setting these values in the constructor.

Saving column settings

If a user customizes his column settings by adding/removing columns or rearranging them, he probably expects to see his customizations the next time he visits the page. DataGrids can automatically save these settings in the user’s profile, if the application supports it. Handling this is as simple as adding some fields to the app’s Profile model and setting a couple options in the DataGrid. For example:

# myapp/models.py

class Profile(models.Model):
    user = models.ForeignKey(User, unique=True)

    # Sort settings. One per datagrid type.
    sort_user_columns = models.CharField(max_length=256, blank=True)

    # Column settings. One per datagrid type.
    user_columns = models.CharField(max_length=256, blank=True)
# myapp/datagrids.py

class UserDataGrid(DataGrid):
    ...
    def __init__(self, request):
        self.profile_sort_field = "sort_user_columns"
        self.profile_columns_field = "user_columns"

The DataGrid code will handle all the loading and saving of customizations in the Profile. Easy, isn’t it?

Requirements and compatibility

Right now, datagrids require both the Yahoo! UI Library and either yui-ext (which is hard to find now) or its replacement, ExtJS.

Review Board, which Djblets was mainly developed for, still relies on yui-ext instead of ExtJS, so datagrids today by default are built with that in mind. However, the JavaScript portion of datagrids should work with ExtJS today. The default template for rendering the datagrids (located in djblets/datagrids/templates/datagrids/) assumes both YUI and yui-ext are present in the media path, but sites using datagrids are encouraged to provide their own template to reflect the actual locations and libraries used.

We would like to work with more JavaScript libraries, so if anyone wants to provide an implementation for their favorite library or help to dumb down our implementation to work with any and all, we’d welcome patches.

See it in action

We make extensive use of datagrids on Review Board. You can get a sense of it and play around by looking at the users list and the dashboard (which requires an account).

That’s it for now!

There’s more to datagrids than what I’ve covered here, but this should give you a good understanding of how they work. Later on I’d like to do a more in-depth article on how datagrids work and some more advanced use cases for them.

For now, take a look at the source code and Review Board’s datagrids for some real-world examples.

Django Tips: PIL, ImageField and Unit Tests

I recently spent a few days trying to track down a bug in Review Board with our unit tests. Basically, unit tests involving uploading a file to a models.ImageField would fail with a validation error specifying that the file wasn’t an image. This worked fine when actually using the application, just not during unit tests.

The following code would cause this problem. (Note that this isn’t actual code, just a demonstration.)

# myapp/models.py
from django.db import models

class MyModel(models.Model):
    image = models.ImageField()
# myapp/forms.py
from django import forms
from myapp.models import MyModel

class UploadImageForm(forms.Form):
    image_path = forms.ImageField(required=True)

    def create(self, file):
        mymodel = MyModel()
        mymodel.save_image_file(file.name, file, save=True)
        return mymodel
# myapp/views.py
from myapp.forms import UploadImageForm

def upload(request):
    form = UploadImageForm(request.POST, request.FILES)

    if not form.is_valid():
        # Return some error
        pass

    # Return success
# myapp/tests.py
from django.test import TestCase

class MyTests(TestCase):
    def testImageUpload(self):
        f = open("testdata/image.png", "r")
        resp = self.client.post("/upload/", {
            'image_path': f,
        })
        f.close()

After a good many hours going through the Django code, expecting there to be a problem with file handles becoming invalidated or the read position not being reset to the beginning of the file, I finally decided to look at PIL, the Python Imaging Library. It turns out that at this point, due to some weirdness with unit tests, PIL hadn’t populated its known list of file formats. All it knew was BMP, and so my perfectly fine PNG file didn’t work.

The solution? Be sure to import PIL and call init() before the unit tests begin. If you have a custom runner (as we do), then putting it in here is appropriate. Just add the following:

from PIL import Image
Image.init()

Hopefully this helps someone else with the problems I’ve dealt with the past few days.

Django Development with Djblets: Unrooting your URLs

Typically, Django sites are designed with the assumption that they’ll have a domain or subdomain to themselves. Often times this is fine, but if you’re developing a web application designed for redistribution, sometimes you can’t make that assumption.

During development of Review Board, many of our users wanted the ability to install Review Board into a subdirectory of one of their domains, rather than a subdomain.

There’s a few rules that are important when making your site relocatable:

  • Always use MEDIA_URL when referring to your media directory, so that people can customize where they put their media files.
  • Don’t hard-code URLs in templates. Use the {% url %} tag or get_absolute_url() when possible.

These solve some of the issues, but doesn’t address relocating a Django site to a subdirectory.

Djblets fills in this gap by providing a special URL handler designed to act as a prefix for all your projects’ URLs. To make use of this, you need to modify your settings.py file as follows:

settings.py

TEMPLATE_CONTEXT_PROCESSORS = (
    ...
    'djblets.util.context_processors.siteRoot',
)

SITE_ROOT_URLCONF = 'yourproject.urls'
ROOT_URLCONF = 'djblets.util.rooturl'

SITE_ROOT = '/'

SITE_ROOT specifies where the site is located. By default, this should be “/”, but this can be changed to point to any path. For example, “/myproject/”. Note that it should always have a leading and a trailing slash.

The custom template context processor (djblets.util.context_processors.siteRoot) will make SITE_ROOT accessible to templates.

SITE_ROOT should be used in templates when you need to refer to URLs that aren’t designed to respect SITE_ROOT (such as User.get_absolute_url). Your own custom applications should always respect SITE_ROOT whenever providing a URL.

ROOT_URLCONF is typically what you would set to point to your project’s URL. However, in this case, you’ll be pointing it to djblets.util.rooturl. This in turn will forward all URLs to your project’s handler, defined in SITE_ROOT_URLCONF.

This is all you need to have a fully relocatable Django site!

To sum up:

  1. Add djblets.util.context_processors.siteRoot to your TEMPLATE_CONTEXT_PROCESSORS.
  2. Set SITE_ROOT_URLCONF to your project’s URL handler.
  3. Set ROOT_URLCONF to ‘djblets.util.rooturl’
  4. Prefix any URLs with SITE_ROOT in your templates, unless the URL would already take SITE_ROOT into account.

This is functionality that will hopefully make its way into Django at some point. For now, you have an easy way of unrooting your Django project.

Django Development with Djblets: Custom Tag Helpers

I’m planning to cover all of what Django can do, but for now, let’s start simple with something most Django developers spend way too much time creating: Custom tags.

Django’s nice enough to provide a @register.simple_tag decorator for creating very basic tags that don’t take a context but do take parameters. This is great, but what if you want more? Many Django applications use the same boilerplate time and time again to create their tags, but we make it much easier.

Introducing @basictag and @blocktag.

@basictag

Ever wanted to use Django’s @register.simple_tag but needed access to the context? I’ve found far too many cases where this would be useful, but Django doesn’t make this easy. Your tag code would end up looking like this:

class MyTagNode(template.Node):
    def __init__(self, arg1, arg2):
        self.arg1 = arg1
        self.arg2 = arg2

    def render(self, context):
        arg1 = Variable(self.arg1).resolve(context)
        arg2 = Variable(self.arg2).resolve(context)

        return context['user']

@register.tag
def mytag(parser, token):
    bits = token.split_contents()
    return MyTagNode(bits[1], bits[2])

Do this a few times and it’s bound to drive you nuts. How about this instead?

from djblets.util.decorators import basictag

@register.tag
@basictag(takes_context=True)
def mytag(context, arg1, arg2):
    return context['user']

Far less code and increased readability. Hooray!

@blocktag

@blocktag aims to do the same thing @basictag does but for block tags. A block tag is a tag that contains nested content, like @spaceless or @for. This usually requires even more boilerplate than the above code fragment, except with the added complexity of having to grab the contents of the block.

We’ve condensed it down to this:

from djblets.util.decorators import blocktag

@register.tag
@blocktag
def blinkblock(context, nodelist, arg1, arg2):
    return "<blink>%s</blink>" % nodelist.render(context)

If you’ve built block tags in the past, you’ll appreciate how simple that was.