Using Django bulk_create and bulk_update

Even though Django is known for the fast and comfortable development of web applications, sometimes performance can become an issue. There can be a lot of bottlenecks, and Django often has tools to prevent it. 

One of the most time-consuming processes in web development is accessing the database. However, many performance improvements can be made with minor changes using built-in tools.

In this article, I’ll cover two known methods for basic optimization:

  • bulk_create
  • bulk_update

Django bulk_create

Django bulk_create can help us optimize our application using a small number of database calls to save a lot of data. In other words, bulk_create can save multiple model instances into the database using only one database call.

bulk_create(objs, batch_size=None, ignore_conflicts=False)

bulk_create requires one argument and has two default non-required arguments:

  • objs – List of objects that should be created.
  • batch_size – Defines how many instances are created with a single database call. Defaults None.
  • ignore_conflicts – Ignores any insert failures. Defaults False.
WARNING - If ignore_conflicts is set to True, setting the primary key on each model instance is disabled.
Oracle database doesn't support ignore_conflicts=True.

Django bulk_create example

In bulk_create and bulk_update, I’ll use the same Post model. It’s a simple model containing only two arguments, title and time_published.

class Post(models.Model):
    title = models.CharField(max_length=512)
    time_published = models.DateTimeField(auto_now_add=True)

If we were to create three posts the standard way, it would look like this:

Post.objects.create(title="Django bulk_create example 1.0")
Post.objects.create(title="Django bulk_create example 2.0")
Post.objects.create(title="Django bulk_create example 3.0")

Here we are accessing the database three times. Let’s see the same example using the bulk method.

Post.objects.bulk_create(
    [Post(title='Django bulk_create example 1.0'),
     Post(title='Django bulk_create example 2.0'), 
     Post(title='Django bulk_create example 3.0')
])

Only one database call is used to create these three instances of Post using bulk_create. How much time can we save, and is bulk_create even faster than the standard create method? I’ve done a detailed analysis in the next chapter.

How much is bulk create actually faster?

In this section, we’ll test the time performance of bulk_create, comparing it to standard (one by one) way of inserting data into the database.

NOTE - For time analysis, I used the cProfile module detailed explained in the official Python documentation.

In the following examples, we’ll test the performance of inserting 1000 rows into the database.

First, let’s check out the standard method:

def without_bulk():
    for i in range(1000):
        Post.objects.create(title=f"Post {i}")

p = cProfile.Profile()
p.runcall(without_bulk)
p.print_stats(sort='tottime')

Running profilers on a given method, we get this output:

266262 function calls (261255 primitive calls) in 4.121 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1001    2.721    0.003    2.740    0.003 {method 'execute' of 'psycopg2.extensions.cursor' objects}
     3000    0.070    0.000    0.079    0.000 query.py:151(__init__)
     1000    0.060    0.000    0.306    0.000 compiler.py:1371(as_sql)
     1000    0.057    0.000    0.076    0.000 base.py:406(__init__)
     1000    0.054    0.000    3.520    0.004 compiler.py:1432(execute_sql)
     2000    0.048    0.000    0.068    0.000 utils.py:105(debug_sql)

More than half of the time was spent on the execute method of the psycopg2 library. Every time we saved Post, execute method needs to be called, obviously that’s a bottleneck that can be avoided.

Let’s try out bulk_create:

def with_bulk():
    Post.objects.bulk_create([Post(title=f"Post {i}") for i in range(1000)])

p = cProfile.Profile()
p.runcall(with_bulk)
p.print_stats(sort='tottime')

Running profilers on the bulk method, we get this output:

83591 function calls (82572 primitive calls) in 0.177 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.042    0.042    0.043    0.043 {built-in method psycopg2._psycopg._connect}
     1000    0.037    0.000    0.051    0.000 base.py:406(__init__)
        2    0.020    0.010    0.020    0.010 {method 'execute' of 'psycopg2.extensions.cursor' objects}
        1    0.006    0.006    0.018    0.018 query.py:460(_prepare_for_bulk_create)
        1    0.006    0.006    0.057    0.057 views.py:10(<listcomp>)
     3000    0.005    0.000    0.007    0.000 __init__.py:845(get_default)

It took a standard way more than 3.944 seconds more to save the same amount (1000) model instances to the database. That’s almost 25 times slower than bulk_create. So obviously, we made some performance improvements.

Let’s check out bulk_update now.

Django bulk_update

Django bulk_update has the same goal as bulk_create, which is to minimize the number of queries to the database. The only difference is that now we are updating given objects instead of inserting them into the database.

bulk_update(objs, fields, batch_size=None)

bulk_update requires two arguments and has one default non-required argument:

  • objs – List of objects that should be created.
  • fields – Fields that should be updated.
  • batch_size – Defines how many instances are updated with a single database call. Defaults None.

Django bulk_update example

We’ll use the same Post model used in the bulk_create example.

As seen below:

class Post(models.Model):
    title = models.CharField(max_length=512)
    time_published = models.DateTimeField(auto_now_add=True)

Considering we already have our 1000 Posts from bulk_create example, we’ll change (update) the title of every single one, making it 1000 database calls once again:

def without_bulk():
    posts = list(Post.objects.all())
    for index, post in enumerate(posts):
        posts[index].title = f"Post {index}"
        posts[index].save()

p = cProfile.Profile()
p.runcall(without_bulk)
p.print_stats(sort='tottime')

We got the following results:

451700 function calls (446664 primitive calls) in 4.312 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1002    2.577    0.003    2.591    0.003 {method 'execute' of 'psycopg2.extensions.cursor' objects}
     3000    0.060    0.000    0.109    0.000 query.py:292(clone)
     1000    0.053    0.000    0.278    0.000 query.py:1221(build_filter)
     1000    0.041    0.000    4.235    0.004 base.py:747(save_base)
     2002    0.041    0.000    0.057    0.000 utils.py:105(debug_sql)
     1000    0.039    0.000    0.401    0.000 compiler.py:1521(as_sql)

Let’s now see bulk_update:

def with_bulk():
    posts = list(Post.objects.all())
    for index, post in enumerate(posts):
        posts[index].title = f"Post {index}"

    Post.objects.bulk_update(posts, ['title'])

p = cProfile.Profile()
p.runcall(with_bulk)
p.print_stats(sort='tottime')

And we get these results:

437459 function calls (384393 primitive calls) in 0.477 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        3    0.037    0.012    0.037    0.012 {method 'execute' of 'psycopg2.extensions.cursor' objects}
     6007    0.024    0.000    0.096    0.000 copy.py:66(copy)
     1001    0.016    0.000    0.080    0.000 query.py:1221(build_filter)
26068/26062    0.013    0.000    0.013    0.000 {built-in method builtins.getattr}
     6007    0.013    0.000    0.038    0.000 copy.py:258(_reconstruct)
11023/1017    0.012    0.000    0.054    0.000 functional.py:40(__get__)

Once again, we saved around 3.83 seconds only by implementing the built-in Django tool making our code 10 times faster.

Results summary

This short chapter will cover all-time results obtained from creating and updating 1000 objects using bulk and standard methods.

Method usedSeconds
bulk_create0.177
bulk_update0.477
create4.121
update4.312
Django 4.0.1
Python 3.8.10
PostgreSQL 14.1

System specs:

CPU: Intel® Core™ i5-7200U CPU @ 2.50GHz × 4
GPU: NV117 / Mesa Intel® HD Graphics 620
RAM: 8GB