Skip to content

PYTHON-4947 - GridFS spec: Add performant 'delete revisions by filena… #2218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 31, 2025

Conversation

NoahStapp
Copy link
Contributor

…me' feature - delete_by_name

@NoahStapp NoahStapp requested a review from ShaneHarvey March 24, 2025 15:57
Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add the changelog entry here too.

"""
_disallow_transactions(session)
files = self._files.find({"filename": filename}, {"_id": 1}, session=session)
file_ids = [file_id["_id"] async for file_id in files]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[file["_id"] async for file in files]

_disallow_transactions(session)
files = self._files.find({"filename": filename}, {"_id": 1}, session=session)
file_ids = [file_id["_id"] async for file_id in files]
res = await self._files.delete_many({"_id": {"$in": file_ids}}, session=session)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the spec say anything about what should happen if file_ids is so large/numerous it overflows maxBsonObjectSize? Assuming all ids are ObjectIds, this will happen when a single filename has around 850,000 revisions:

>>> len(encode({"_id": {"$in": [ObjectId()]*850000}}))
16888915
>>> MAX_BSON_SIZE
16777216

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes no mention of that edge case. Do we have a standard pattern for overflow issues in other APIs?

files = self._files.find({"filename": filename}, {"_id": 1}, session=session)
file_ids = [file_id["_id"] async for file_id in files]
res = await self._files.delete_many({"_id": {"$in": file_ids}}, session=session)
await self._chunks.delete_many({"files_id": {"$in": file_ids}}, session=session)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could improve this by using client.bulk_write on 8.0+ servers to combine both deletes into one command but that's a spec issue.

@@ -834,6 +834,35 @@ async def delete(self, file_id: Any, session: Optional[AsyncClientSession] = Non
if not res.deleted_count:
raise NoFile("no file could be deleted because none matched %s" % file_id)

@_csot.apply
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add this helper to all the tests for sessions/transactions/CSOT. For example test_gridfs_bucket in test_session and test_gridfs_does_not_support_transactions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a separate ticket?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be in this ticket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the existing tests missing this decorator have it added in a separate ticket? Those changes seem unrelated to this addition.

Copy link
Member

@ShaneHarvey ShaneHarvey Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's completely related. We're adding a new api so we have to test all the features. Although I'm not not what you mean by "tests missing this decorator".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying we need to add the delete_by_name helper to the tests for sessions/transactions/CSOT...

self.assertEqual(0, await self.db.fs.files.count_documents({}))
self.assertEqual(0, await self.db.fs.chunks.count_documents({}))
gfs = gridfs.AsyncGridFSBucket(self.db)
await gfs.upload_from_stream("test_filename", b"hello", chunk_size_bytes=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should upload multiple versions of test_filename and assert all are deleted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see the spec test already does this so you can disregard.

@NoahStapp NoahStapp requested a review from ShaneHarvey March 27, 2025 14:13
@NoahStapp NoahStapp merged commit 8675a16 into mongodb:master Mar 31, 2025
34 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants