How to use Django loaddata and dumpdata?

When we start with a project in Django, we often don’t have a populated database with production or development-ready data. Every time we need to test a new endpoint, creating new instances seems like too much work to do. That’s is why Django has powerful tools such as loaddata and dumpdata.

What are loaddata and dumpdata in Django?

Django loaddata is a command used for pre-populating database with data, especially during the development phase. Data is loaded from fixture files that represent serialized data.

Django dumpdata is a command used for dumping data from database to fixture files. Output from dumpdata can be in various file formats. The received file can be used for the loaddata command.

All following examples will follow the next model in the database:

class Author(models.Model):
   first_name = models.CharField(max_length=512)
   last_name = models.CharField(max_length=512)

Django loaddata command

As already mentioned, it’s used for pre-populating data in our database. But things might become a bit more complicated.

Let’s start with a basic example:

$ django-admin loaddata fixture
$ manage.py loaddata fixture
$ python -m django loaddata fixture

Fixture represents a file or collection of files with data already serialized and ready for populating the database. These files should be in a format that Django recognizes, and these formats are explained in official Django documentation on serialization formats.

The fixture file for the Author model can look like this:

[
 {
   "model": "app.Author",
   "pk": 1,
   "fields": {
     "first_name": "William",
     "last_name": "Shakespeare"
   }
 },
 {
   "model": "app.Author",
   "pk": 2,
   "fields": {
     "first_name": "Miguel",
     "last_name": "de Cervantes"
   }
 }
]

Here we can see 3 main attributes that every fixture file, no matter format, should have specified:

model – defines model in database ( <app_name>.<model_name> )
pk – primary key, it’s non-required, but it’s a good practice is to keep it specified.
fields – key-value pairs of model fields. All non-nullable fields should be defined.

Let’s try to load our fixtures into the database.

>>> ./manage.py loaddata app/fixtures/authors.json
Installed 2 object(s) from 1 fixture(s)

Our Author’s table in the database looks like this:

id	first_name	last_name
1	William	Shakespeare
2	Miguel	de Cervantes

WARNING - If we change the fixture file and rerun the loaddata command, our previous data with the same primary key will be overwritten!

Also, to help us achieve the functionality we are looking for, the loaddata command comes with a few flags shown in the following table:

flag	flag description	command example
`--database DATABASE`	Defines the database we want to load with data. Defaults ‘default’.	`./manage.py loaddata app/fixtures/authors.json --database test_database`
`--ignorenonexistent, -i`	Ignores models/fields that no longer exist. Only comes as a flag with no additional arguments.	`./manage.py loaddata app/fixtures/authors.json -i`
`--app APP_LABEL`	Specifies app which models will be loaded into the database.	`./manage.py loaddata app/fixtures/authors.json --app authors`
`--format FORMAT`	Defines serialization format described in official Django documentation.	`./manage.py loaddata app/fixtures/authors.json --format=json`
`--exclude EXCLUDE, -e EXCLUDE`	Excludes loading fixtures from excluded app or model.	`./manage.py loaddata app/fixtures/authors.json --exclude app.Author`

Django dumpdata command

Dumpdata does exactly contrary to loaddata. It dumps data from the database to standard output. Dumpdata also gives us some flags to help us dump precisely the data we want.

$ django-admin dumpdata [app_label[.ModelName]
$ manage.py dumpdata [app_label[.ModelName]
$ python -m django dumpdata [app_label[.ModelName]

app_label is not required, and if not specified, all applications database models will be dumped into the standard output in JSON format (unless specified differently).

We often want to catch standard output into some file (fixture file) to later use it to loaddata. Catching standard output to file can be done using > operator as seen in the example below:

./manage.py dumpdata > db.json

This creates a db.json file with database representation in JSON file. This file contains all sorts of models created by default in Django. Also, this file contains our author model.

If we want to catch only our specified model, we should run the following command:

./manage.py dumpdata app.author > db.json --indent 4

NOTE - Indent flag is used only for pretty formatting.

Our db.json now looks like this:

[
{
   "model": "bulk.author",
   "pk": 1,
   "fields": {
       "first_name": "William",
       "last_name": "Shakespeare"
   }
},
{
   "model": "bulk.author",
   "pk": 2,
   "fields": {
       "first_name": "Miguel",
       "last_name": "de Cervantes"
   }
}
]

As already mentioned dumpdata has a few useful flags you should start using:

flag	flag description	command example
`--all, -a`	Dumps all data (that could be filtered using Custom Manager).	`./manage.py dumpdata > db.json --all`
`--format FORMAT`	Serializes to one of specified format: xml, json, jsonl, yaml. Defaults json.	`./manage.py dumpdata > db.json --format xml`
`--indent INDENT`	Used for formatting. Defines a number of spaces. Defaults None.	`./manage.py dumpdata > db.json --indent 4`
`--exclude EXCLUDE, -e EXCLUDE`	Excludes specified app or model from dumping.	`./manage.py dumpdata > db.json --exclude auth.permission`
`--database DATABASE`	Defines database that will be dumped.	`./manage.py dumpdata > db.json --database test_database`
`--natural-foreign`	Django natural_key() method is called to serializer OneToMany/ManyToMany relationship keys.	`./manage.py dumpdata > db.json --natural-foreign`
`--natural-primary`	Removes “pk” attribute from database data.	`./manage.py dumpdata > db.json --natural-primary`
`--pks PRIMARY_KEYS`	Dumps only model with a given primary key. Works only with one Model.	`./manage.py dumpdata bulk.Author > db.json --pks 1`
`--output OUTPUT, -o OUTPUT`	Specifies output. Works as > operator.	`./manage.py dumpdata --output db.json`

Serialization Formats

Formats that could be used were mentioned in the loadata and dumpdata sections. And they are pretty important to understand so you could make use of both of these commands in bet possible way.

Existing serialization formats in Django:

xml
json
jsonl
Yaml

XML Format

After running this command:

./manage.py dumpdata app.Author --output db.xml --format xml --indent 4

Our db.xml file looks like this:

<?xml version="1.0" encoding="utf-8"?>
<django-objects version="1.0">
   <object model="app.author" pk="1">
       <field name="first_name" type="CharField">William</field>
       <field name="last_name" type="CharField">Shakespeare</field>
   </object>
   <object model="app.author" pk="2">
       <field name="first_name" type="CharField">Miguel</field>
       <field name="last_name" type="CharField">de Cervantes</field>
   </object>
</django-objects>

JSON Format

Default format. In other words, we don’t need to specify –format flag to receive JSON format into output.

./manage.py dumpdata app.Author --output db.json --format json --indent 4

db.json file looks like this:

[
{
   "model": "app.author",
   "pk": 1,
   "fields": {
       "first_name": "William",
       "last_name": "Shakespeare"
   }
},
{
   "model": "app.author",
   "pk": 2,
   "fields": {
       "first_name": "Miguel",
       "last_name": "de Cervantes"
   }
}
]

JSONL Format

JSON Lines text format. Its whole point is to database object into one line, as you will see in the output of the next command:

./manage.py dumpdata app.Author --output db.json --format jsonl --indent 4

db.jsonl file looks like this:

{"model": "app.author","pk": 1,"fields": {"first_name": "William","last_name": "Shakespeare"}}
{"model": "app.author","pk": 2,"fields": {"first_name": "Miguel","last_name": "de Cervantes"}}

YAML Format

YAML is a data serialization language. It’s widely used because of its readability over other formats. However, to use this format in Django, You have to have installed the PyYAML package.

./manage.py dumpdata app.Author --output db.yaml --format yaml --indent 4

Output looks like this:

-   model: app.author
   pk: 1
   fields:
       first_name: William
       last_name: Shakespeare
-   model: app.author
   pk: 2
   fields:
       first_name: Miguel
       last_name: de Cervantes