I'm working on a knowledge management/internal wiki tool for organizations.
check it out

Tips to optimize django SQL queries

When you first start with Django you will probably find the ORM to be an amazing feature. I agree with that, but the issue with ORMs is that it is too easy to forget that at the end the ORM generate SQL queries.

Django has no way to optimize those queries if the programmer doesn't tell Django what to do, here is the list of 'tips' I use to optimize those queries :

Activate SQL logging in the console

By doing so you can see all generated SQL, there are differnt ways of doing it, but here is my preferred approach :

Put that in your setttings.py

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'handlers': {
        'console': {
            'class': 'logging.StreamHandler',
        },
    },
    'loggers': {
        'django.db.backends': {
            'level': 'DEBUG',
            'handlers': ['console'],
        },
      'django.db.template': {
            'level': 'DEBUG',
            'handlers': ['console'],
        }

    }
}   

That will log all SQL in the console in which manage.py runserver is started, you will also see the query generated by the admin

Use list()

When you make a query, the result is a Queryset and its a lazy evaluted list, which means that everything you access an element, a SQL request is made, even if you access several time the same first element of the queryset for example.

  • So if you are going to access all the elements of the returned query convert it to a list : list(query_result)
  • If you don't plan to access all the elements of the query, then you should filter more your query before converting it, don't forget that in most case it is much faster to apply the filterings in the SQL query than doing the filtering in Python

Use .select_related()

Django makes it very easy to access a linked model, for example if an order object references a customer1 object, you can doorder.customer` but doing so will make 2 SQL queries, one for the order and the other for customer.

Making on SQL request is most of the time faster, so when you fetch your order, you can add .select_related =>

order  = Order.objects.select_related("customer").get(pk=1)

Use '_id' for smaller results

When on order object has a ForegeignKey customer for example, in a lot of case, you don't want to access the customer but only its id, so in that case, you can just do customer.customer_id

Use .values() or .only()

When you want to access an object you most of the time don't need to have access to all the attributes, but only a list of them. In that case you can add to you query either .values("attribute1","attribute2") or .only("attribute1","attribute2").

.only() will return a django object with only the attributes listed fetched from db (if you need other attributes, another SQL request will be made, so be carefull). And .values() will create a dict, of the key,values of each object

Use .values_list(,flat=True)

When you need to access only one attributes per object, for example when you list the list of all id of some objects, you can do

list(Customer.objects.all().values_list("id",flat=True))

and this will return a list of ids

Use a second db to manage your analytics query

If you don't want your master db to handle the load of analytics query, you can create a readonly replica of your master db. Define the new "replica" db in the DATABASES object of your settings.py and then for your analytics query you can do something like that:

Orders.objects.using("replica").count()

Use .exists() when you want to know if an object exists

If you want to know if an object exists, but don't intend to use this object, you can just query for existence by appending .exists() at the end of the query

Don't use .count() if you want to know if a result has more than x elements

Usually if you want to know if a query has more than x elements you will do something like that Order.objects.all().count() >x this is not good, because you force the db to compute all elements, instead what you can do is :

len(list(Order.objects.values_list("id",flat=True)[:x+1])) == x +1

That what you ask the db to find at max x+1 elements (non ordered) and you check if the query returns x+1 elements. In the case that the db has more than x+1 elements, you query will be faster in the case that it result is greater than x and a bit slower when it is smaller

Remove index from ForeignKey with small number distinct of values

Indexes works well when the number of possible value is as close as possible as the number of columns in the table. But for exampel when you have an Address model with a country ForeignKey, you have at max 200 countries, so the index is not going to be used but will need to be updated. So you better remove it, by doing that:

Class Address(models.Model):
    country = models.ForeignKey("Country",db_index=False)
    ...

Use EXPLAIN ANALYZE

If you want to understand why a query performs badly you need to the see the query plan generated by the db, to get that, just open the dbshell and add EXPLAIN ANALYZE in front of your query:

python manage.py dbshell
> EXPLAIN ANALYZE SELECT id, ...

Use .save(update_field=[]) to update only some attributes

Sometime you have modified only one of two elements of your object and want to save it, without triggering an update with the unchanged attributes, you can do

command.save(update_fields=["price","quantity"])

Use .update() to update several rows at once

When you want to update a set of object at once, instead of iterating and saving each object, you can use .update() on the queryset :

Order.objects.all().update(price="1")

Use .values_list() with relations

Instead of using .select_related() or .prefetch_related() you can directly tell which relation you want in the result, for example

Order.objects.all().select_related("id","address__city","addresse__city__country")

Will return the city, the country and the id of the order in one query

Use F() to rename an attribute

Some time you want to rename db fieldname to a more customer friendly one, one way to do that, is to use the F() function:

from django.db.models import F
cc= Order.objects.all().annotate(order_reference = F("command_order_db_field_not_customerfriendly"))
print(cc[0].order_reference)

Override .get_queryset() in django admin

Wen you want to display related object information (or if the str method uses related objects) in the django admin list, django will make n+1 queries, so you can override your ModelAdmin model and only 1 query will be made

class WebsiteAdmin(admin.ModelAdmin):        
     model = Website
     list_display = ["__str__","nb_jobs"]

     def nb_jobs(self,obj):
         return obj.nb_jobs

     def get_queryset(self, request):
        queryset = super(WebsiteAdmin, self).get_queryset(request)
        from django.db.models import Count
        return queryset.annotate(nb_jobs = Count("jobs"))

Don't use first() or last() when you want to access any element of the query

Using last() or first() on a query will force an order by clause in SQL, instead if you want to get any element from the query do that:

try:
    obj=Order.objects.filter(name__icontains="paul")[0]
except IndexError:
    obj = None
if obj:
    ...

or

obj=list(Order.objects.filter(name__icontains="paul")[:1])
if obj:
    obj = obj[0]
    ...

That will do a basic SELECT A from B where X limit 1 ; without order by, if someone has a simplest way to do that, send me an email I'm really interested