Django QuerySet – Set Operations, Aggregations, Q and F Objects Explained

Continuing on the previous article, where I explained Django queryset and how to make basic queries, now we’ll go through some more advanced queries. This time focus will be on set operations, aggregations, and optimization techniques along with the best practices for using the Django QuerySet. The theory will be followed by practical examples that you can reference to and use for your own Django projects. 

To persist the application data, you’ll usually have to store data in a relational database such as PostgreSQL, MySQL, or SQLite. Django QuerySet API helps you communicate with the database. Methods provided by the QuerySet API are used to execute different operations on data residing in the database. After performing actions, these methods usually return a collection of queries, also known as a queryset.

An important thing to highlight is that querysets are lazy, meaning the database will not be queried until a specific action evaluates the queryset. Evaluation can come in a few different forms like iterating or slicing the queryset, converting the queryset to the list, etc. 

As we already went through the basic CRUD queries in the previous article, the focus will now be on slightly more advanced queries. So, let’s start then!

Using Django QuerySet API – Advanced Queries

In the sections below, I’ll explain how to apply set operations on querysets, how to get aggregated results from the database, what are Q and F objects and how to use them.

I’ll use the Book and Author models to demonstrate mentioned query techniques.

class Author(models.Model):
    name = models.CharField(max_length=100)

class Book(models.Model):
    title = models.CharField(max_length=255)
    published_date = models.DateTimeField(null=True)
    authors = models.ManyToManyField(Author)
    num_pages = models.IntegerField()

Apply migrations for the models above to be able to use them in the interactive shell. First, execute the makemigrations command to generate the migration files.

$ python manage.py makemigrations

Then apply these newly created migrations by running the migrate command. This will propagate changes to the database, or more precisely, create the book and author tables in the database. Database migrations are already covered in this article I wrote.

$ python manage.py migrate

Start the shell by executing the following command in your terminal:

$ python manage.py shell

Before using the Book and Author models, you must import them first.

>>> from library.models import Book, Author

Set operations over Django queryset

Querysets can be combined as any other sets. You can execute the union, intersection, and difference operations over them. The only requirement is that querysets should contain the same fields and datatypes.  

Union operation

The union operator is applied by calling the union() method. The result is a queryset containing all other querysets combined.

>>> q1 = Book.objects.filter(id__lte=2)
>>> q1
<QuerySet [<Book: Book object (1)>]>
>>> q2 = Book.objects.filter(id__gte=2)
>>> q2
<QuerySet [<Book: Book object (2)>, <Book: Book object (3)>]>

>>> q1.union(q2).order_by('id')
<QuerySet [<Book: Book object (1)>, <Book: Book object (2)>, <Book: Book object (3)>]>

Intersection operation

The intersection operator is applied by calling the intersection() method. This method will return a queryset that contains only shared results between combined querysets.

>>> q1 = Book.objects.filter(id__lte=2)
>>> q1
<QuerySet [<Book: Book object (1)>, <Book: Book object (2)>]>
>>> q2 = Book.objects.filter(id__gte=2)
>>> q2
<QuerySet [<Book: Book object (2)>, <Book: Book object (3)>]>


>>> q1.intersection(q2).order_by('id')
<QuerySet [<Book: Book object (2)>]>

Difference operation

The difference operator is applied by calling the difference() method. The returned result will contain only elements of a first queryset that are not shared in other querysets.

>>> q1 = Book.objects.filter(id__lte=2)
>>> q1
<QuerySet [<Book: Book object (1)>, <Book: Book object (2)>]>
>>> q2 = Book.objects.filter(id__gte=2)
>>> q2
<QuerySet [<Book: Book object (2)>, <Book: Book object (3)>]>

>>> q1.difference(q2).order_by('id')
<QuerySet [<Book: Book object (1)>]>

Aggregation

Let’s first address what aggregation is in the context of querysets. As you already know, Django queryset is a collection of queries that return objects from a database. Sometimes you’ll want to calculate a specific value derived from returned objects. This value can be anything – total count of objects, average value, range of values, etc. 

Django helps you calculate those values by giving you the aggregate() and annotate() methods. These methods accept aggregation functions responsible for calculating and returning a dictionary of calculated values. You can see the complete list of aggregation functions in Django’s official documentation.

The major difference between the aggregate() and annotate() methods is in what items each method uses to calculate a summary. The aggregate() method is used to derive the resulting value from the entire QuerySet, while the annotate() method derives values for each item of a QuerySet individually. It may be confusing at first but don’t worry; I’ll demonstrate how both methods work in the following examples.

Django’s aggregate() method

We’ll start with a basic example whose goal is to count the number of Book objects in the database. You can calculate the total count in many different ways, but we’ll use the aggregate() method in this example. Notice the import of the Count aggregation function, which will be responsible for counting all object occurrences for a specific field name. The returned result is a dictionary containing the key comprised of a field name and function name and its calculated value.

>>> from django.db.models import Count
>>> Book.objects.aggregate(Count('id'))
{'id__count': 3}

You can also manually define the name of a returned value by specifying the keyword argument. For example, here we want to assign a “books_total” name to the returned value.

>>> Book.objects.aggregate(books_total=Count('id'))
{'books_total': 3}

If we wanted to calculate the total number of pages for all Book objects, we could use the Sum aggregation function. This aggregation function sums all values for a given field cumulatively.

>>> from django.db.models import Sum
>>> Book.objects.aggregate(num_pages_total=Sum('num_pages'))
{'num_pages_total': 674}

Aggregation methods can be combined with any other method that returns a queryset. So, for example, you can first apply filtering, then calculate the total count of books based on a filtered result.

>>> Book.objects.filter(title__startswith='Fahrenheit').aggregate(books_total=Count('id'))
{'books_total': 1}

To return multiple aggregated values, you need to pass multiple arguments to the aggregate() method. A returned dictionary will contain multiple keys with its values.

>>> from django.db.models import Sum, Min
>>> Book.objects.aggregate(Count('id'), Min('published_date'))
{'id__count': 3, 'published_date__min': datetime.datetime(1953, 10, 13, 0, 0, tzinfo=<UTC>)}

Django’s annotate() method

If your goal is to summarize values for each queryset item, you should use the annotate() method. The annotate() method calculates a specific value and assigns it to the returned object. Arguments passed to the method define what extra fields will be added to the returned objects.

NOTE - The output of the annotate() method is a queryset since it does not aggregate the whole queryset but only values for an individual queryset item.

In our example, notice that the authors field in a Book model is a ManyToManyField; therefore, each book can have one or more authors. A perfect use-case for the annotate() method is to calculate how many authors contributed to each Book in a queryset and assign that value to each Book object. To do that, we’ll use the annotate() method with the Count aggregation function.

>>> from django.db.models import Count
>>> qs = Book.objects.annotate(num_authors=Count('authors'))
>>> qs
<QuerySet [<Book: Book object (1)>, <Book: Book object (2)>, <Book: Book object (3)>]>
>>> qs[0].num_authors
1

The annotate() method returned a queryset. Evaluating that queryset retrieved the values from the database. For example, when we try to access the num_authors field of the first element, we get an actual value representing the number of authors. The num_authors field was added to the returned object because we passed it as an argument to the annotate() method.

Another example could be to retrieve Book objects with an extra field having titles transformed to uppercase letters. Again, we’ll use the annotate() method, but we’ll use the Upper database function this time.

>>> from django.db.models.functions import Upper
>>> qs = Book.objects.annotate(upper_title=Upper('title'))
>>> qs[0].upper_title
"THE HITCHHIKER'S GUIDE TO THE GALAXY"
HINT - Database functions such as Upper, Lower, Concat, Length, Substr, etc., allow you to move some of the logic from Django to the database. That can boost the performance and make the implemented logic cleaner.

Q objects

When you need more flexibility in combining different and more complex queries, Q objects could help you. Q objects encapsulate a collection of keyword arguments representing particular SQL conditions and can be combined by the standard logic operators such as & (AND), | (OR), and ~ (NOT). Like the filter() method, the Q object accepts arguments as a combination of field and lookuptype names separated by the double-underscore (__) sign.

Notice that by chaining multiple filter() calls on a given queryset, you’ll get the same behavior as you if you used the AND operator on all individual filter() calls combined. But what if you wanted to use the OR operator when filtering results? I’ll show you how to do it by using the Q objects.

Let’s say we want to return objects with the title including “fahrenheit” or “galaxy”. Again, we call the filter() method, only this time we do not pass the argument directly to it but use the Q objects. Each filtering criteria is encapsulated into its Q object, and by applying the | (OR) operator, we get the desired result.

>>> from django.db.models import Q
>>> qs = Book.objects.filter(Q(title__icontains='fahrenheit') | Q(title__icontains='galaxy'))
>>> qs
<QuerySet [<Book: Book object (1)>, <Book: Book object (3)>]>
>>> qs.values_list('title')
<QuerySet [("The Hitchhiker's Guide to the Galaxy",), ("Fahrenheit 451",)]>

Another common use-case could be if we want to, for example, exclude all objects that have titles containing the word “fahrenheit”. This time we use the ~ (NOT) operator before the Q object. It will negate the expression defined by the Q object and return all objects that do not satisfy the Q expression.

>>> from django.db.models import Q
>>> qs = Book.objects.filter(~Q(title__icontains='fahrenheit'))
>>> qs
<QuerySet [<Book: Book object (1)>, <Book: Book object (2)>]>
>>> qs.values_list('title')
<QuerySet [("The Hitchhiker's Guide to the Galaxy",), ('Do Androids Dream of Electric Sheep?',)]>

F objects

Similar to how the Q object represents an SQL condition, the F object represents the value of a model field or annotation. While performing some usual Django ORM operations, the model and its values must first load into Python memory. Then the field value is usually updated and again saved to the database. The F object ensures that the field value is updated directly in the database without loading it into the memory first.

This can be a considerable advantage because performing the operation directly in the database requires fewer queries. Also, this behavior prevents possible race conditions.

To demonstrate how the F objects work, imagine that the new book edition with the id=1 was published and the number of pages needs to be updated. We would do that using Django in a classic way:

>>> book = Book.objects.get(id=1)
>>> book.num_pages
208
>>> book.num_pages += 1
>>> book.save()
>>> book.num_pages
209

The disadvantage of the above method is that it uses Django’s ORM to update the value; therefore, it loads the value into memory and holds it until it saves the changes. 

We can achieve the same update much more efficiently using an F object. For example, let’s take a look at the following code snippet.

>>> from django.db.models import F
>>> book = Book.objects.get(id=1)
>>> book.num_pages = F('num_pages') + 1
>>> book.num_pages
<CombinedExpression: F(num_pages) + Value(1)>
>>> book.save()
>>> book.refresh_from_db()
>>> book.num_pages
209

There are a few major differences in the execution of this code snippet compared to the previous one. First, we passed the field name as an argument to the F object, which means a new SQL expression was created involving the defined field. If we print the value of the num_pages field, you can see it’s not a number anymore but an expression. This expression is evaluated after the save() method is called. 

Under the hood, it will execute the generated SQL expression without loading anything into Python memory. An updated value will be correctly displayed only after refreshing the object with the refresh_from_db() method.

WARNING - Changes defined by the F objects are applied every time the save() method is executed. So, if you call the save() method two times consecutively, it will update the value twice, even if you defined the F expression only once. For that reason, do not forget to use the refresh_from_db() method to reset the F expression and protect field values from inconsistent behavior.

Summary

Django has some really useful methods for communicating with the database. For example, Django’s QuerySet methods include set operations, aggregation functions, abstracted SQL expressions, etc. When it is best to apply these advanced techniques mostly depends on the use-case you are handling. 

To recap, with set operations, you can combine different querysets via the union, intersection, and difference functions. Aggregation functions are used when specific values should be derived from the queryset results or even individual queryset objects. For example, you can calculate count, sum, average, min, max, and many other values for a given queryset. Optimization can be achieved by using the Q and F objects, which are abstractions of SQL commands. That way, fewer queries are usually executed, and possible issues with the race conditions can be avoided. 

I recommend you to deep-dive into Django’s official documentation to learn about the queryset methods. Reading this blog will also help you upgrade your skills to become a true Django master.

Until next time lads!