Chapter 9: Advanced Models · Mastering Django: Core

In Chapter 4, we presented an introduction to Django’s database layer – how to define models and how to use the database API to create, retrieve, update and delete records. In this chapter, we’ll introduce you to some more advanced features of this part of Django. [TOC] ## Related Objects Recall our book models from Chapter 4: ~~~ from django.db import models class Publisher(models.Model): name = models.CharField(max_length=30) address = models.CharField(max_length=50) city = models.CharField(max_length=60) state_province = models.CharField(max_length=30) country = models.CharField(max_length=50) website = models.URLField() def __str__(self): return self.name class Author(models.Model): first_name = models.CharField(max_length=30) last_name = models.CharField(max_length=40) email = models.EmailField() def __str__(self): return '%s %s' % (self.first_name, self.last_name) class Book(models.Model): title = models.CharField(max_length=100) authors = models.ManyToManyField(Author) publisher = models.ForeignKey(Publisher) publication_date = models.DateField() def __str__(self): return self.title ~~~ As we explained in Chapter 4, accessing the value for a particular field on a database object is as straightforward as using an attribute. For example, to determine the title of the book with ID 50, we’d do the following: ~~~ >>> from mysite.books.models import Book >>> b = Book.objects.get(id=50) >>> b.title 'The Django Book' ~~~ But one thing we didn’t mention previously is that related objects – fields expressed as either a `ForeignKey`or `ManyToManyField` – act slightly differently. ### Accessing Foreign Key Values When you access a field that’s a `ForeignKey`, you’ll get the related model object. For example: ~~~ >>> b = Book.objects.get(id=50) >>> b.publisher <Publisher: Apress Publishing> >>> b.publisher.website 'http://www.apress.com/' ~~~ With `ForeignKey` fields, it works the other way, too, but it’s slightly different due to the non-symmetrical nature of the relationship. To get a list of books for a given publisher, use `publisher.book_set.all()`, like this: ~~~ >>> p = Publisher.objects.get(name='Apress Publishing') >>> p.book_set.all() [<Book: The Django Book>, <Book: Dive Into Python>, ...] ~~~ Behind the scenes, `book_set` is just a `QuerySet` (as covered in Chapter 4), and it can be filtered and sliced like any other `QuerySet`. For example: ~~~ >>> p = Publisher.objects.get(name='Apress Publishing') >>> p.book_set.filter(title__icontains='django') [<Book: The Django Book>, <Book: Pro Django>] ~~~ The attribute name `book_set` is generated by appending the lower case model name to `_set`. ### Accessing Many-to-Many Values Many-to-many values work like foreign-key values, except we deal with `QuerySet` values instead of model instances. For example, here’s how to view the authors for a book: ~~~ >>> b = Book.objects.get(id=50) >>> b.authors.all() [<Author: Adrian Holovaty>, <Author: Jacob Kaplan-Moss>] >>> b.authors.filter(first_name='Adrian') [<Author: Adrian Holovaty>] >>> b.authors.filter(first_name='Adam') [] ~~~ It works in reverse, too. To view all of the books for an author, use `author.book_set`, like this: ~~~ >>> a = Author.objects.get(first_name='Adrian', last_name='Holovaty') >>> a.book_set.all() [<Book: The Django Book>, <Book: Adrian's Other Book>] ~~~ Here, as with `ForeignKey` fields, the attribute name `book_set` is generated by appending the lower case model name to `_set`. ## Managers In the statement `Book.objects.all()`, `objects` is a special attribute through which you query your database. In Chapter 4, we briefly identified this as the model’s *manager*. Now it’s time to dive a bit deeper into what managers are and how you can use them. In short, a model’s manager is an object through which Django models perform database queries. Each Django model has at least one manager, and you can create custom managers in order to customize database access. There are two reasons you might want to create a custom manager: to add extra manager methods, and/or to modify the initial `QuerySet` the manager returns. ### Adding Extra Manager Methods Adding extra manager methods is the preferred way to add “table-level” functionality to your models. (For “row-level” functionality – i.e., functions that act on a single instance of a model object – use model methods, which are explained later in this chapter.) For example, let’s give our `Book` model a manager method `title_count()` that takes a keyword and returns the number of books that have a title containing that keyword. (This example is slightly contrived, but it demonstrates how managers work.) ~~~ # models.py from django.db import models # ... Author and Publisher models here ... class BookManager(models.Manager): def title_count(self, keyword): return self.filter(title__icontains=keyword).count() class Book(models.Model): title = models.CharField(max_length=100) authors = models.ManyToManyField(Author) publisher = models.ForeignKey(Publisher) publication_date = models.DateField() num_pages = models.IntegerField(blank=True, null=True) objects = BookManager() def __str__(self): return self.title ~~~ With this manager in place, we can now do this: ~~~ >>> Book.objects.title_count('django') 4 >>> Book.objects.title_count('python') 18 ~~~ Here are some notes about the code: * We’ve created a `BookManager` class that extends `django.db.models.Manager`. This has a single method,`title_count()`, which does the calculation. Note that the method uses `self.filter()`, where `self` refers to the manager itself. * We’ve assigned `BookManager()` to the `objects` attribute on the model. This has the effect of replacing the “default” manager for the model, which is called `objects` and is automatically created if you don’t specify a custom manager. We call it `objects` rather than something else, so as to be consistent with automatically created managers. Why would we want to add a method such as `title_count()`? To encapsulate commonly executed queries so that we don’t have to duplicate code. ### Modifying Initial Manager QuerySets A manager’s base `QuerySet` returns all objects in the system. For example, `Book.objects.all()` returns all books in the book database. You can override a manager’s base `QuerySet` by overriding the `Manager.get_queryset()` method. `get_queryset()`should return a `QuerySet` with the properties you require. For example, the following model has *two* managers – one that returns all objects, and one that returns only the books by Roald Dahl. ~~~ from django.db import models # First, define the Manager subclass. class DahlBookManager(models.Manager): def get_queryset(self): return super(DahlBookManager, self).get_queryset().filter(author='Roald Dahl') # Then hook it into the Book model explicitly. class Book(models.Model): title = models.CharField(max_length=100) author = models.CharField(max_length=50) # ... objects = models.Manager() # The default manager. dahl_objects = DahlBookManager() # The Dahl-specific manager. ~~~ With this sample model, `Book.objects.all()` will return all books in the database, but`Book.dahl_objects.all()` will only return the ones written by Roald Dahl. Note that we explicitly set `objects`to a vanilla `Manager` instance, because if we hadn’t, the only available manager would be `dahl_objects`. Of course, because `get_queryset()` returns a `QuerySet` object, you can use `filter()`, `exclude()` and all the other`QuerySet` methods on it. So these statements are all legal: ~~~ Book.dahl_objects.all() Book.dahl_objects.filter(title='Matilda') Book.dahl_objects.count() ~~~ This example also pointed out another interesting technique: using multiple managers on the same model. You can attach as many `Manager()` instances to a model as you’d like. This is an easy way to define common “filters” for your models. For example: ~~~ class MaleManager(models.Manager): def get_queryset(self): return super(MaleManager, self).get_queryset().filter(sex='M') class FemaleManager(models.Manager): def get_queryset(self): return super(FemaleManager, self).get_queryset().filter(sex='F') class Person(models.Model): first_name = models.CharField(max_length=50) last_name = models.CharField(max_length=50) sex = models.CharField(max_length=1, choices=(('M', 'Male'), ('F', 'Female'))) people = models.Manager() men = MaleManager() women = FemaleManager() ~~~ This example allows you to request `Person.men.all()`, `Person.women.all()`, and `Person.people.all()`, yielding predictable results. If you use custom `Manager` objects, take note that the first `Manager` Django encounters (in the order in which they’re defined in the model) has a special status. Django interprets this first `Manager` defined in a class as the “default” `Manager`, and several parts of Django (though not the admin application) will use that `Manager`exclusively for that model. As a result, it’s often a good idea to be careful in your choice of default manager, in order to avoid a situation where overriding of `get_queryset()` results in an inability to retrieve objects you’d like to work with. ## Model methods Define custom methods on a model to add custom “row-level” functionality to your objects. Whereas managers are intended to do “table-wide” things, model methods should act on a particular model instance. This is a valuable technique for keeping business logic in one place – the model. An example is the easiest way to explain this. Here’s a model with a few custom methods: ~~~ from django.db import models class Person(models.Model): first_name = models.CharField(max_length=50) last_name = models.CharField(max_length=50) birth_date = models.DateField() def baby_boomer_status(self): "Returns the person's baby-boomer status." import datetime if self.birth_date < datetime.date(1945, 8, 1): return "Pre-boomer" elif self.birth_date < datetime.date(1965, 1, 1): return "Baby boomer" else: return "Post-boomer" def _get_full_name(self): "Returns the person's full name." return '%s %s' % (self.first_name, self.last_name) full_name = property(_get_full_name) ~~~ The model instance reference in Appendix A has a complete list of methods automatically given to each model. You can override most of these – see below – but there are a couple that you’ll almost always want to define: `__str__()` A Python “magic method” that returns a unicode “representation” of any object. This is what Python and Django will use whenever a model instance needs to be coerced and displayed as a plain string. Most notably, this happens when you display an object in an interactive console or in the admin. You’ll always want to define this method; the default isn’t very helpful at all. `get_absolute_url()` This tells Django how to calculate the URL for an object. Django uses this in its admin interface, and any time it needs to figure out a URL for an object. Any object that has a URL that uniquely identifies it should define this method. ### Overriding predefined model methods There’s another set of model methods that encapsulate a bunch of database behavior that you’ll want to customize. In particular you’ll often want to change the way `save()` and `delete()` work. You’re free to override these methods (and any other model method) to alter behavior. A classic use-case for overriding the built-in methods is if you want something to happen whenever you save an object. For example (see `save()` for documentation of the parameters it accepts): ~~~ from django.db import models class Blog(models.Model): name = models.CharField(max_length=100) tagline = models.TextField() def save(self, *args, **kwargs): do_something() super(Blog, self).save(*args, **kwargs) # Call the "real" save() method. do_something_else() ~~~ You can also prevent saving: ~~~ from django.db import models class Blog(models.Model): name = models.CharField(max_length=100) tagline = models.TextField() def save(self, *args, **kwargs): if self.name == "Yoko Ono's blog": return # Yoko shall never have her own blog! else: super(Blog, self).save(*args, **kwargs) # Call the "real" save() method. ~~~ It’s important to remember to call the superclass method – that’s that `super(Blog, self).save(*args, **kwargs)` business – to ensure that the object still gets saved into the database. If you forget to call the superclass method, the default behavior won’t happen and the database won’t get touched. It’s also important that you pass through the arguments that can be passed to the model method – that’s what the `*args, **kwargs` bit does. Django will, from time to time, extend the capabilities of built-in model methods, adding new arguments. If you use `*args, **kwargs` in your method definitions, you are guaranteed that your code will automatically support those arguments when they are added. Overridden model methods are not called on bulk operations Note that the `delete()` method for an object is not necessarily called when deleting objects in bulk using a QuerySet. To ensure customized delete logic gets executed, you can use `pre_delete` and/or `post_delete`signals. Unfortunately, there isn’t a workaround when `creating` or `updating` objects in bulk, since none of `save()`,`pre_save`, and `post_save` are called. ## Executing Raw SQL Queries When the model query APIs don’t go far enough, you can fall back to writing raw SQL. Django gives you two ways of performing raw SQL queries: you can use [`Manager.raw()`](http://masteringdjango.com/django-advanced-models/#Manager.raw "Manager.raw") to [perform raw queries and return model instances](http://masteringdjango.com/django-advanced-models/#performing-raw-queries), or you can avoid the model layer entirely and [execute custom SQL directly](http://masteringdjango.com/django-advanced-models/#executing-custom-sql-directly). Warning You should be very careful whenever you write raw SQL. Every time you use it, you should properly escape any parameters that the user can control by using `params` in order to protect against SQL injection attacks. ## Performing raw queries The `raw()` manager method can be used to perform raw SQL queries that return model instances: `Manager.``raw`(*raw_query*, *params=None*, *translations=None*) This method takes a raw SQL query, executes it, and returns a `django.db.models.query.RawQuerySet` instance. This `RawQuerySet` instance can be iterated over just like a normal `QuerySet` to provide object instances. This is best illustrated with an example. Suppose you have the following model: ~~~ class Person(models.Model): first_name = models.CharField(...) last_name = models.CharField(...) birth_date = models.DateField(...) ~~~ You could then execute custom SQL like so: ~~~ >>> for p in Person.objects.raw('SELECT * FROM myapp_person'): ... print(p) John Smith Jane Jones ~~~ Of course, this example isn’t very exciting – it’s exactly the same as running `Person.objects.all()`. However, `raw()` has a bunch of other options that make it very powerful. Model table names Where’d the name of the `Person` table come from in that example? By default, Django figures out a database table name by joining the model’s “app label” – the name you used in `manage.py startapp` – to the model’s class name, with an underscore between them. In the example we’ve assumed that the `Person` model lives in an app named `myapp`, so its table would be `myapp_person`. For more details check out the documentation for the `db_table` option, which also lets you manually set the database table name. Warning No checking is done on the SQL statement that is passed in to `.raw()`. Django expects that the statement will return a set of rows from the database, but does nothing to enforce that. If the query does not return rows, a (possibly cryptic) error will result. Warning If you are performing queries on MySQL, note that MySQL’s silent type coercion may cause unexpected results when mixing types. If you query on a string type column, but with an integer value, MySQL will coerce the types of all values in the table to an integer before performing the comparison. For example, if your table contains the values `'abc'`, `'def'` and you query for `WHERE mycolumn=0`, both rows will match. To prevent this, perform the correct typecasting before using the value in a query. Warning While a `RawQuerySet` instance can be iterated over like a normal `QuerySet`, `RawQuerySet` doesn’t implement all methods you can use with `QuerySet`. For example, `__bool__()` and `__len__()` are not defined in `RawQuerySet`, and thus all `RawQuerySet` instances are considered `True`. The reason these methods are not implemented in`RawQuerySet` is that implementing them without internal caching would be a performance drawback and adding such caching would be backward incompatible. ### Mapping query fields to model fields `raw()` automatically maps fields in the query to fields on the model. The order of fields in your query doesn’t matter. In other words, both of the following queries work identically: ~~~ >>> Person.objects.raw('SELECT id, first_name, last_name, birth_date FROM myapp_person') ... >>> Person.objects.raw('SELECT last_name, birth_date, first_name, id FROM myapp_person') ... ~~~ Matching is done by name. This means that you can use SQL’s `AS` clauses to map fields in the query to model fields. So if you had some other table that had `Person` data in it, you could easily map it into `Person`instances: ~~~ >>> Person.objects.raw('''SELECT first AS first_name, ... last AS last_name, ... bd AS birth_date, ... pk AS id, ... FROM some_other_table''') ~~~ As long as the names match, the model instances will be created correctly. Alternatively, you can map fields in the query to model fields using the `translations` argument to `raw()`. This is a dictionary mapping names of fields in the query to names of fields on the model. For example, the above query could also be written: ~~~ >>> name_map = {'first': 'first_name', 'last': 'last_name', 'bd': 'birth_date', 'pk': 'id'} >>> Person.objects.raw('SELECT * FROM some_other_table', translations=name_map) ~~~ ### Index lookups `raw()` supports indexing, so if you need only the first result you can write: ~~~ >>> first_person = Person.objects.raw('SELECT * FROM myapp_person')[0] ~~~ However, the indexing and slicing are not performed at the database level. If you have a large number of`Person` objects in your database, it is more efficient to limit the query at the SQL level: ~~~ >>> first_person = Person.objects.raw('SELECT * FROM myapp_person LIMIT 1')[0] ~~~ ### Deferring model fields Fields may also be left out: ~~~ >>> people = Person.objects.raw('SELECT id, first_name FROM myapp_person') ~~~ The `Person` objects returned by this query will be deferred model instances (see `defer()`). This means that the fields that are omitted from the query will be loaded on demand. For example: ~~~ >>> for p in Person.objects.raw('SELECT id, first_name FROM myapp_person'): ... print(p.first_name, # This will be retrieved by the original query ... p.last_name) # This will be retrieved on demand ... John Smith Jane Jones ~~~ From outward appearances, this looks like the query has retrieved both the first name and last name. However, this example actually issued 3 queries. Only the first names were retrieved by the raw() query – the last names were both retrieved on demand when they were printed. There is only one field that you can’t leave out – the primary key field. Django uses the primary key to identify model instances, so it must always be included in a raw query. An `InvalidQuery` exception will be raised if you forget to include the primary key. ### Adding annotations You can also execute queries containing fields that aren’t defined on the model. For example, we could use [PostgreSQL’s age() function](http://www.postgresql.org/docs/current/static/functions-datetime.html) to get a list of people with their ages calculated by the database: ~~~ >>> people = Person.objects.raw('SELECT *, age(birth_date) AS age FROM myapp_person') >>> for p in people: ... print("%s is %s." % (p.first_name, p.age)) John is 37. Jane is 42. ... ~~~ ### Passing parameters into `raw()` If you need to perform parameterized queries, you can use the `params` argument to `raw()`: ~~~ >>> lname = 'Doe' >>> Person.objects.raw('SELECT * FROM myapp_person WHERE last_name = %s', [lname]) ~~~ `params` is a list or dictionary of parameters. You’ll use `%s` placeholders in the query string for a list, or `%(key)s` placeholders for a dictionary (where `key` is replaced by a dictionary key, of course), regardless of your database engine. Such placeholders will be replaced with parameters from the `params` argument. Note Dictionary params are not supported with the SQLite backend; with this backend, you must pass parameters as a list. Warning Do not use string formatting on raw queries! It’s tempting to write the above query as: ~~~ >>> query = 'SELECT * FROM myapp_person WHERE last_name = %s' % lname >>> Person.objects.raw(query) ~~~ Don’t. Using the `params` argument completely protects you from [SQL injection attacks](http://en.wikipedia.org/wiki/SQL_injection), a common exploit where attackers inject arbitrary SQL into your database. If you use string interpolation, sooner or later you’ll fall victim to SQL injection. As long as you remember to always use the `params` argument you’ll be protected. ## Executing custom SQL directly Sometimes even [`Manager.raw()`](http://masteringdjango.com/django-advanced-models/#Manager.raw "Manager.raw") isn’t quite enough: you might need to perform queries that don’t map cleanly to models, or directly execute `UPDATE`, `INSERT`, or `DELETE` queries. In these cases, you can always access the database directly, routing around the model layer entirely. The object `django.db.connection` represents the default database connection. To use the database connection, call `connection.cursor()` to get a cursor object. Then, call `cursor.execute(sql, [params])` to execute the SQL and `cursor.fetchone()` or `cursor.fetchall()` to return the resulting rows. For example: ~~~ from django.db import connection def my_custom_sql(self): cursor = connection.cursor() cursor.execute("UPDATE bar SET foo = 1 WHERE baz = %s", [self.baz]) cursor.execute("SELECT foo FROM bar WHERE baz = %s", [self.baz]) row = cursor.fetchone() return row ~~~ Note that if you want to include literal percent signs in the query, you have to double them in the case you are passing parameters: ~~~ cursor.execute("SELECT foo FROM bar WHERE baz = '30%'") cursor.execute("SELECT foo FROM bar WHERE baz = '30%%' AND id = %s", [self.id]) ~~~ If you are using more than one database, you can use `django.db.connections` to obtain the connection (and cursor) for a specific database. `django.db.connections` is a dictionary-like object that allows you to retrieve a specific connection using its alias: ~~~ from django.db import connections cursor = connections['my_db_alias'].cursor() # Your code here... ~~~ By default, the Python DB API will return results without their field names, which means you end up with a `list` of values, rather than a `dict`. At a small performance cost, you can return results as a `dict` by using something like this: ~~~ def dictfetchall(cursor): "Returns all rows from a cursor as a dict" desc = cursor.description return [ dict(zip([col[0] for col in desc], row)) for row in cursor.fetchall() ] ~~~ Here is an example of the difference between the two: ~~~ >>> cursor.execute("SELECT id, parent_id FROM test LIMIT 2"); >>> cursor.fetchall() ((54360982L, None), (54360880L, None)) >>> cursor.execute("SELECT id, parent_id FROM test LIMIT 2"); >>> dictfetchall(cursor) [{'parent_id': None, 'id': 54360982L}, {'parent_id': None, 'id': 54360880L}] ~~~ ### Connections and cursors `connection` and `cursor` mostly implement the standard Python DB-API described in [PEP 249](https://www.python.org/dev/peps/pep-0249), except when it comes to transaction handling. If you’re not familiar with the Python DB-API, note that the SQL statement in `cursor.execute()` uses placeholders, `"%s"`, rather than adding parameters directly within the SQL. If you use this technique, the underlying database library will automatically escape your parameters as necessary. Also note that Django expects the `"%s"` placeholder, *not* the `"?"` placeholder, which is used by the SQLite Python bindings. This is for the sake of consistency and sanity. Using a cursor as a context manager: ~~~ with connection.cursor() as c: c.execute(...) ~~~ is equivalent to: ~~~ c = connection.cursor() try: c.execute(...) finally: c.close() ~~~ ### Adding extra Manager methods Adding extra `Manager` methods is the preferred way to add “table-level” functionality to your models. (For “row-level” functionality – i.e., functions that act on a single instance of a model object – use Model methods, not custom `Manager` methods.) A custom `Manager` method can return anything you want. It doesn’t have to return a `QuerySet`. For example, this custom `Manager` offers a method `with_counts()`, which returns a list of all `OpinionPoll`objects, each with an extra `num_responses` attribute that is the result of an aggregate query: ~~~ from django.db import models class PollManager(models.Manager): def with_counts(self): from django.db import connection cursor = connection.cursor() cursor.execute(""" SELECT p.id, p.question, p.poll_date, COUNT(*) FROM polls_opinionpoll p, polls_response r WHERE p.id = r.poll_id GROUP BY p.id, p.question, p.poll_date ORDER BY p.poll_date DESC""") result_list = [] for row in cursor.fetchall(): p = self.model(id=row[0], question=row[1], poll_date=row[2]) p.num_responses = row[3] result_list.append(p) return result_list class OpinionPoll(models.Model): question = models.CharField(max_length=200) poll_date = models.DateField() objects = PollManager() class Response(models.Model): poll = models.ForeignKey(OpinionPoll) person_name = models.CharField(max_length=50) response = models.TextField() ~~~ With this example, you’d use `OpinionPoll.objects.with_counts()` to return that list of `OpinionPoll` objects with `num_responses` attributes. Another thing to note about this example is that `Manager` methods can access `self.model` to get the model class to which they’re attached. ## What’s Next? In the next chapter, we’ll show you Django’s “[generic views](http://masteringdjango.com/django-generic-views/)” framework, which lets you save time in building Web sites that follow common patterns.