Django QuerySet缓存和迭代器

2019-09-16

535 words

Page content

Lazy QuerySet

A queryset in Django represents a number of rows in the database, optionally filtered by a query. For example, the following code represents all people in the database whose first name is ‘Dave’:

person_set = Person.objects.filter(first_name="Dave")

The above code doesn’t run any database queries. You can can take the person_set and apply additional filters, or pass it to a function, and nothing will be sent to the database. This is good, because querying the database is one of the things that significantly slows down web applications.

To fetch the data from the database, you need to iterate over the queryset:

for person in person_set:
    print(person.last_name)

QuerySet Cache

Evaluates the QuerySet (by performing the query) and returns an iterator (see PEP 234) over the results. A QuerySet typically caches its results internally so that repeated evaluations do not result in additional queries.

Each QuerySet contains a cache to minimize database access. Understanding how it works will allow you to write the most efficient code.

In a newly created QuerySet, the cache is empty. The first time a QuerySet is evaluated – and, hence, a database query happens – Django saves the query results in the QuerySet’s cache and returns the results that have been explicitly requested (e.g., the next element, if the QuerySet is being iterated over). Subsequent evaluations of the QuerySet reuse the cached results.

Keep this caching behavior in mind, because it may bite you if you don’t use your QuerySets correctly. For example, the following will create two QuerySets, evaluate them, and throw them away:

Non-use Cache example

>>> print([e.headline for e in Entry.objects.all()])
>>> print([e.pub_date for e in Entry.objects.all()])

That means the same database query will be executed twice, effectively doubling your database load. Also, there’s a possibility the two lists may not include the same database records, because an Entry may have been added or deleted in the split second between the two requests.

Reuse Cache example

To avoid this problem, save the QuerySet and reuse it:

>>> queryset = Entry.objects.all()
>>> print([p.headline for p in queryset]) # Evaluate the query set.
>>> print([p.pub_date for p in queryset]) # Re-use the cache from the evaluation.

QuerySet iterator()

In contrast, iterator() will read results directly, without doing any caching at the QuerySet level (internally, the default iterator calls iterator() and caches the return value).

For a QuerySet which returns a large number of objects that you only need to access once, this can result in better performance and a significant reduction in memory.

Of course, using the iterator() method to avoid populating the queryset cache means that iterating over the same queryset again will execute another query.
So use iterator() with caution, and make sure that your code is organised to avoid repeated evaluation of the same huge queryset.

Also, use of iterator() causes previous prefetch_related() calls to be ignored since these two optimizations do not make sense together.

`iterator()` example

from blog.models import Article
for i in Article.objects.all().iterator():
    print i