Skip to content

made QuerySet iteration respect chunk_size #88

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 25, 2024

Conversation

timgraham
Copy link
Collaborator

No description provided.

@timgraham timgraham requested review from Jibola and WaVEV July 25, 2024 01:31
chunk = []
for i, row in enumerate(cursor):
if i % chunk_size == 0 and i > 0:
yield chunk
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible bug, always happens. Shouldn't chunk get re-initialized here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other thing, you could move line 116 up above the if and you can remove the i > 0 check

Copy link
Collaborator

@WaVEV WaVEV Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, the last query. 😄 . you aren't using len(chunk) I think the function would be easy to red in the following way:

def cursor_iter(self, cursor, chunk_size, columns):
    """Generator to yield chunks from cursor."""
    chunk = []
    for row in cursor:
        chunk.append(self._make_result(row, columns))
        if len(chunk) == chunk_size:
            yield chunk
            chunk = []
    yield chunk

@timgraham timgraham force-pushed the chunk_size branch 2 times, most recently from b0a8296 to 6313c34 Compare July 25, 2024 12:03
if not chunked_fetch:
# If using non-chunked reads, read data into memory.
return list(result)
return result

def results_iter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if this is a dumb question. How is this getting called in relation to _make_result or cursor_iter

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's called afterward, except when results is None and this method invokes execute_sql() itself. You can look at ModelIterable.__iter__() in django/db/models/query.py and see compiler.execute_sql and compiler.results_iter

Comment on lines +114 to +117
chunk.append(self._make_result(row, columns))
if len(chunk) == chunk_size:
yield chunk
chunk = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So "Chunk" here is the number of items in a list and not bounded by bytes? (confirming as that was the expectation)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Django's default for chunk_size is 100 (rows), or 2000 if using QuerySet.iterator().

@timgraham timgraham merged commit a58f54c into mongodb:main Jul 25, 2024
3 checks passed
@timgraham timgraham deleted the chunk_size branch July 25, 2024 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants