Tips to optimize django SQL queries
When you first start with Django you will probably find the ORM to be an amazing feature. I agree with that, but the issue with ORMs is that it is too easy to forget that at the end the ORM generate SQL queries.
Django has no way to optimize those queries if the programmer doesn't tell Django what to do, here is the list of 'tips' I use to optimize those queries :
Activate SQL logging in the console
By doing so you can see all generated SQL, there are differnt ways of doing it, but here is my preferred approach :
Put that in your setttings.py
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'handlers': {
'console': {
'class': 'logging.StreamHandler',
},
},
'loggers': {
'django.db.backends': {
'level': 'DEBUG',
'handlers': ['console'],
},
'django.db.template': {
'level': 'DEBUG',
'handlers': ['console'],
}
}
}
That will log all SQL in the console in which manage.py runserver is started, you will also see the query generated by the admin
Use list()
When you make a query, the result is a Queryset and its a lazy evaluted list, which means that everything you access an element, a SQL request is made, even if you access several time the same first element of the queryset for example.
- So if you are going to access all the elements of the returned query convert it to a list :
list(query_result)
- If you don't plan to access all the elements of the query, then you should filter more your query before converting it, don't forget that in most case it is much faster to apply the filterings in the SQL query than doing the filtering in Python
Use .select_related()
Django makes it very easy to access a linked model, for example if an order
object references a customer1 object, you can do
order.customer` but doing so will make 2 SQL queries, one for the order and the other for customer.
Making on SQL request is most of the time faster, so when you fetch your order, you can add .select_related
=>
order = Order.objects.select_related("customer").get(pk=1)
Use '_id' for smaller results
When on order
object has a ForegeignKey customer
for example, in a lot of case, you don't want to access the customer
but only its id, so in that case, you can just do customer.customer_id
Use .values()
or .only()
When you want to access an object you most of the time don't need to have access to all the attributes, but only a list of them. In that case you can add to you query either .values("attribute1","attribute2")
or .only("attribute1","attribute2")
.
.only()
will return a django object with only the attributes listed fetched from db (if you need other attributes, another SQL request will be made, so be carefull). And .values()
will create a dict, of the key,values of each object
Use .values_list(,flat=True)
When you need to access only one attributes per object, for example when you list the list of all id of some objects, you can do
list(Customer.objects.all().values_list("id",flat=True))
and this will return a list of ids
Use a second db to manage your analytics query
If you don't want your master db to handle the load of analytics query, you can create a readonly replica of your master db. Define the new "replica" db in the DATABASES object of your settings.py and then for your analytics query you can do something like that:
Orders.objects.using("replica").count()
Use .exists()
when you want to know if an object exists
If you want to know if an object exists, but don't intend to use this object, you can just query for existence by appending .exists()
at the end of the query
Don't use .count()
if you want to know if a result has more than x elements
Usually if you want to know if a query has more than x elements you will do something like that Order.objects.all().count() >x
this is not good, because you force the db to compute all elements, instead what you can do is :
len(list(Order.objects.values_list("id",flat=True)[:x+1])) == x +1
That what you ask the db to find at max x+1 elements (non ordered) and you check if the query returns x+1 elements. In the case that the db has more than x+1 elements, you query will be faster in the case that it result is greater than x and a bit slower when it is smaller
Remove index from ForeignKey with small number distinct of values
Indexes works well when the number of possible value is as close as possible as the number of columns in the table. But for exampel when you have an Address model with a country ForeignKey, you have at max 200 countries, so the index is not going to be used but will need to be updated. So you better remove it, by doing that:
Class Address(models.Model):
country = models.ForeignKey("Country",db_index=False)
...
Use EXPLAIN ANALYZE
If you want to understand why a query performs badly you need to the see the query plan generated by the db, to get that, just open the dbshell and add EXPLAIN ANALYZE in front of your query:
python manage.py dbshell
> EXPLAIN ANALYZE SELECT id, ...
Use .save(update_field=[])
to update only some attributes
Sometime you have modified only one of two elements of your object and want to save it, without triggering an update with the unchanged attributes, you can do
command.save(update_fields=["price","quantity"])
Use .update()
to update several rows at once
When you want to update a set of object at once, instead of iterating and saving each object, you can use .update()
on the queryset :
Order.objects.all().update(price="1")
Use .values_list()
with relations
Instead of using .select_related()
or .prefetch_related()
you can directly tell which relation you want in the result, for example
Order.objects.all().select_related("id","address__city","addresse__city__country")
Will return the city, the country and the id of the order in one query
Use F()
to rename an attribute
Some time you want to rename db fieldname to a more customer friendly one, one way to do that, is to use the F()
function:
from django.db.models import F
cc= Order.objects.all().annotate(order_reference = F("command_order_db_field_not_customerfriendly"))
print(cc[0].order_reference)
Override .get_queryset()
in django admin
Wen you want to display related object information (or if the str method uses related objects) in the django admin list, django will make n+1 queries, so you can override your ModelAdmin model and only 1 query will be made
class WebsiteAdmin(admin.ModelAdmin):
model = Website
list_display = ["__str__","nb_jobs"]
def nb_jobs(self,obj):
return obj.nb_jobs
def get_queryset(self, request):
queryset = super(WebsiteAdmin, self).get_queryset(request)
from django.db.models import Count
return queryset.annotate(nb_jobs = Count("jobs"))
Don't use first()
or last()
when you want to access any element of the query
Using last()
or first()
on a query will force an order by clause in SQL, instead if you want to get any element from the query do that:
try:
obj=Order.objects.filter(name__icontains="paul")[0]
except IndexError:
obj = None
if obj:
...
or
obj=list(Order.objects.filter(name__icontains="paul")[:1])
if obj:
obj = obj[0]
...
That will do a basic SELECT A from B where X limit 1 ; without order by, if someone has a simplest way to do that, send me an email I'm really interested