How to use Django select_related and prefetch_related?

Django ORM (object-relational mapping) is a tool that helps us make queries on relational databases using Python code that is later translated into SQL. ORM provides methods that are used to optimize queries on the database.  Django select_related and prefetch_related are methods used to join tables using Python or SQL. Their optimization purpose is to reduce the number of queries.

  • select_related – creates INNER JOIN between tables.
  • prefetch_related – connects tables using Python (but queries both tables).

In all defined sections, we’ll use models:

from django.db import models

class Author(models.Model):
   first_name = models.CharField(max_length=512)
   last_name = models.CharField(max_length=512)

class Book(models.Model):
   title = models.CharField(max_length=512)
   author = models.ForeignKey(Author, on_delete=models.CASCADE)

And our database is pre-populated with data.

Author table

idfirst_namelast_name
1WilliamShakespeare
2Miguelde Cervantes

Book table

idtitleauthor_id
1Hamlet1
2Romeo and Juliet1
3Macbeth1
4Don Quixote2

Django select_related

The select_related in Django is a method used to INNER JOIN connected tables. This implies retrieving all columns from the specified model related to the model on which we are executing this method.

It might sound a bit abstract, but it will become clearer with examples.

Let’s take a look at an example without using select_related first. In examples, we’ll show how Django ORM code translates into SQL and how many queries are needed to get the result.

Plain lookup method:

>>> Book.objects.get(id=1).author.first_name

‘William’

This translates into two SQL queries:

SELECT "book"."id", "book"."title", "book"."author_id" FROM "book" WHERE "book"."id" = 1

SELECT "author"."id", "author"."first_name", "author"."last_name" FROM "author" WHERE "author"."id" = 1

First SQL query finds the author_id connected to book with id=1.
The second SQL query finds column values to the author with id=1 (found from the first query).

Let’s now take a look at the Django select_related method example.

>>> Book.objects.select_related('author').get(id=1).author.first_name

‘William’
SELECT "book"."id", "book"."title", "book"."author_id", "author"."id", "author"."first_name", "author"."last_name" FROM "book" INNER JOIN "author" ON ("book"."author_id" = "author"."id") WHERE "book"."id" = 1

We get the same result as the plain method, but we spared one database query. We can see that the Django select_related method uses INNER JOIN to connect tables by matching ids from both tables.

Django select_related INNER JOIN example.

Now, we can understand the whole logic behind INNER JOIN. First, INNER JOIN connects two tables by matching their foreign key ids and connected table id. Then, it appends corresponding table columns to create a new table that can be accessed without having to “hit” the database again.

Also, select_related can be used in loops. Again, we’ll check both plain and select_related methods. In these examples, the goal is to fetch the first_name of all the books in our database defined in the first section.

Plain lookup method:

books = Book.objects.all() 
# <QuerySet [<Book: Book object (1)>, <Book: Book object (2)>, <Book: Book object (3)>, <Book: Book object (4)>]>

for book in books: 
    print(book.author.first_name)

Since we have four books in our database, this queries the database five times, the first time fetching all books, and four times to get authors first_name every time we iterate over the book.

SELECT "book"."id", "book"."title", "book"."author_id" FROM "book";

SELECT "author"."id", "author"."first_name", "author"."last_name" FROM "author" WHERE "author"."id" = 1;
SELECT "author"."id", "author"."first_name", "author"."last_name" FROM "author" WHERE "author"."id" = 1;
SELECT "author"."id", "author"."first_name", "author"."last_name" FROM "author" WHERE "author"."id" = 1;
SELECT "author"."id", "author"."first_name", "author"."last_name" FROM "author" WHERE "author"."id" = 2;

Let’s take a look at select_related method.

books = Book.objects.select_related('author')
# <QuerySet [<Book: Book object (1)>, <Book: Book object (2)>, <Book: Book object (3)>, <Book: Book object (4)>]>

for book in books: 
    print(book.author.first_name)

# William
# William
# William
# Miguel

This creates only one query to the database, since we join books table with authors table, we already have first_name and no additional queries are required:

SELECT "book"."id", "book"."title", "book"."author_id", "author"."id", "author"."first_name", "author"."last_name" FROM "book" INNER JOIN "author" ON ("book"."author_id" = "author"."id") 

Already here, you can see how many database calls we can save using Django select_related method. Only by changing one line of code, we save four database calls since we have four books in our database. Imagine if we had thousands or even millions of records in our database!

Django prefetch_related

The Django prefetch_related method as select_related has an optimization purpose, and both achieve the same purpose but in different ways. The prefetch_related method queries both tables and connects them using Python (not SQL), so basically, its efficiency is somewhat lower than the select_related method. But prefetch_related has some other qualities.

Let’s first check the differences between plain lookups and the prefetch_related methods in Django.

As we’ve already seen, the plain lookup method translates into two SQL queries. Can we get a better result using the prefetch_related method?

>>> Book.objects.prefetch_related('author').get(id=1).author.first_name

'William'

This is also translated into two SQL queries, as seen below:

SELECT "book"."id", "book"."title", "book"."author_id" FROM "book" WHERE "book"."id" = 1;

SELECT "author"."id", "author"."first_name", "author"."last_name" FROM "author" WHERE "author"."id" IN (1);

As for now, we can’t see optimization potential, but let’s try other examples.

Now, let’s check optimizing in loops.

books = Book.objects.prefetch_related('author')
# <QuerySet [<Book: Book object (1)>, <Book: Book object (2)>, <Book: Book object (3)>, <Book: Book object (4)>]>

for book in books: 
    print(book.author.first_name)

# William
# William
# William
# Miguel

Our output of authors remains the same.
Let’s take a quick look at SQL queries:

SELECT "book"."id", "book"."title", "book"."author_id" FROM "book";

SELECT "author"."id", "author"."first_name", "author"."last_name" FROM "author" WHERE "author"."id" IN (1, 2);

As you can see, no additional queries are made. Instead, we have two queries that fetch both tables. And Python connects these tables making “join in Python” (INNER JOIN we can’t see).

That, of course, has its pros and cons. Efficiency-wise, select_related is faster and more efficient, but select_related could not be used for ManyToManyField. And that’s the main prefetch_related advantage.

NOTE -  ManyToMany relationships should not be handled using SQL because many performance issues could appear when dealing with large tables. That's why the prefetch_related method joins tables inside Python avoiding making large SQL joins.

We will now change models to demonstrate how to use prefetch_related in ManyToManyField and why it is used that way.

from django.db import models


class Author(models.Model):
   first_name = models.CharField(max_length=512)
   last_name = models.CharField(max_length=512)


class Book(models.Model):
   title = models.CharField(max_length=512)
   authors = models.ManyToManyField(Author)

   def __str__(self):
       return "%s (%s)" % (
           self.title,
           ", ".join(author.last_name for author in self.authors.all()),
       )

We now have an additional table that connects authors and books in the database. Many To Many relationships are explained in this article.

Author table

idfirst_namelast_name
1StephenKing
2PeterStraub

Book table

idtitle
1The Talisman
2Black House

Book_author table

idbook_idauthor_id
111
212
321
422

We’ve deliberately defined __str__ method connected to the ManyToMany table for examples.

>>> Book.objects.all()

<QuerySet [<Book: The Talisman (King, Straub)>, <Book: Black House (King, Straub)>]>

For only two books with same authors, we get three queries:

SELECT "book"."id", "book"."title" FROM "book”;

SELECT "author"."id", "author"."first_name", "author"."last_name" FROM "author" INNER JOIN "book_author" ON ("author"."id" = "book_author"."author_id") WHERE "book_author"."book_id" = 5; 

SELECT "author"."id", "author"."first_name", "author"."last_name" FROM "author" INNER JOIN "book_author" ON ("author"."id" = "book_author"."author_id") WHERE "book_author"."book_id" = 6;

Because we are calling authors.all() every time for every book we get, this can become very unoptimized, very soon. That’s why prefetch_related exists! So we can simply do this:

>>> Book.objects.all().prefetch_related('author')

<QuerySet [<Book: The Talisman (King, Straub)>, <Book: Black House (King, Straub)>]>
SELECT "book"."id", "book"."title" FROM "book”;

SELECT ("book_author"."book_id") AS "_prefetch_related_val_book_id", "author"."id", "author"."first_name", "author"."last_name" FROM "author" INNER JOIN "book_author" ON ("author"."id" = "book_author"."author_id") WHERE "book_author"."book_id" IN (5, 6);

Now, we have only two database calls, and no matter the amount of connections in Many To Many relationship, we’ll always have only two calls.

Instead of having to call authors.objects.all(), it will find them in a QuerySet cache that was made using the Django prefetch_related method.

Conclusion

Database calls can be costly operations not only in Django but in every web framework, so it’s important to optimize as much as possible with all the tools these frameworks provide. Sparing one database call doesn’t seem much. Still, when performing queries on complex database tables with multiple optimizing potentials, we can save seconds and even minutes using the methods described in the article.

Use the select_related method when using One To One and One To Many (Many To One) relationships. This is because the select_related method creates INNER JOIN between tables, and it’s the most efficient way to fetch data from the referenced table. On the other hand, You should use the prefetch_related method with Many To Many relationships as it makes JOIN inside Python.