-
Notifications
You must be signed in to change notification settings - Fork 1.1k
PYTHON-4947 - GridFS spec: Add performant 'delete revisions by filena… #2218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…me' feature - delete_by_name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add the changelog entry here too.
gridfs/asynchronous/grid_file.py
Outdated
""" | ||
_disallow_transactions(session) | ||
files = self._files.find({"filename": filename}, {"_id": 1}, session=session) | ||
file_ids = [file_id["_id"] async for file_id in files] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[file["_id"] async for file in files]
_disallow_transactions(session) | ||
files = self._files.find({"filename": filename}, {"_id": 1}, session=session) | ||
file_ids = [file_id["_id"] async for file_id in files] | ||
res = await self._files.delete_many({"_id": {"$in": file_ids}}, session=session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the spec say anything about what should happen if file_ids is so large/numerous it overflows maxBsonObjectSize? Assuming all ids are ObjectIds, this will happen when a single filename has around 850,000 revisions:
>>> len(encode({"_id": {"$in": [ObjectId()]*850000}}))
16888915
>>> MAX_BSON_SIZE
16777216
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes no mention of that edge case. Do we have a standard pattern for overflow issues in other APIs?
files = self._files.find({"filename": filename}, {"_id": 1}, session=session) | ||
file_ids = [file_id["_id"] async for file_id in files] | ||
res = await self._files.delete_many({"_id": {"$in": file_ids}}, session=session) | ||
await self._chunks.delete_many({"files_id": {"$in": file_ids}}, session=session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could improve this by using client.bulk_write on 8.0+ servers to combine both deletes into one command but that's a spec issue.
@@ -834,6 +834,35 @@ async def delete(self, file_id: Any, session: Optional[AsyncClientSession] = Non | |||
if not res.deleted_count: | |||
raise NoFile("no file could be deleted because none matched %s" % file_id) | |||
|
|||
@_csot.apply |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add this helper to all the tests for sessions/transactions/CSOT. For example test_gridfs_bucket in test_session and test_gridfs_does_not_support_transactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a separate ticket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be in this ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the existing tests missing this decorator have it added in a separate ticket? Those changes seem unrelated to this addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's completely related. We're adding a new api so we have to test all the features. Although I'm not not what you mean by "tests missing this decorator".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm saying we need to add the delete_by_name helper to the tests for sessions/transactions/CSOT...
self.assertEqual(0, await self.db.fs.files.count_documents({})) | ||
self.assertEqual(0, await self.db.fs.chunks.count_documents({})) | ||
gfs = gridfs.AsyncGridFSBucket(self.db) | ||
await gfs.upload_from_stream("test_filename", b"hello", chunk_size_bytes=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test should upload multiple versions of test_filename and assert all are deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see the spec test already does this so you can disregard.
…me' feature - delete_by_name