-
Notifications
You must be signed in to change notification settings - Fork 1.1k
PYTHON-4915 - Add guidance on adding _id fields to documents to CRUD spec, reorder client.bulk_write generated _id fields #1976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
8906e84
425cd1b
1b3df52
36187bb
0e07e18
13568b1
da83afc
8894f23
b2dede3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -133,7 +133,10 @@ def add_insert(self, namespace: str, document: _DocumentOut) -> None: | |
validate_is_document_type("document", document) | ||
# Generate ObjectId client side. | ||
if not (isinstance(document, RawBSONDocument) or "_id" in document): | ||
document["_id"] = ObjectId() | ||
new_document = {"_id": ObjectId()} | ||
new_document.update(document) | ||
document.clear() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The more I think about it the more I think it's problematic to call clear() and update() here. Those methods could have unintentional side effects aside from the perf problems. For example consider a user passing a custom mapping class which overrides clear()/update(). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using ChainMap makes sense, agreed. Explicitly modifying a user-supplied mapping will always carry some risks unfortunately, using the least amount of APIs as possible seems like a safer bet here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did some more thinking: is the added complexity and changing of the type to if "_id" in document:
document = {"_id": document["_id"]} | document
else:
id = ObjectId()
document["_id"] = id
document = {"_id": id} | document If the original document already had an This also resolves the doctest error we're seeing due to using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Simpler yes, but it's not performant:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Based on the above, the slow approach adds 2 milliseconds per 100,000 fields copied on my machine. That's significant enough to warrant the complexity. For the doc test, we probably want to unwrap the ChainMap (via There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Excellent point! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah unwrapping it back into the original map makes sense. |
||
document.update(new_document) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thoughts on the perf implications of this vs ChainMap? Yet another way is to encode the documents to RawBSONDocuments thus relying on the bson layer to reorder the id field. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using ChainMap is cleaner, encoding to RawBSONDocuments might have additional performance costs. |
||
cmd = {"insert": -1, "document": document} | ||
self.ops.append(("insert", cmd)) | ||
self.namespaces.append(namespace) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
from __future__ import annotations | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's add the boilerplate License comment. |
||
|
||
from test import PyMongoTestCase | ||
|
||
import pytest | ||
|
||
from pymongo import InsertOne | ||
|
||
try: | ||
from mockupdb import MockupDB, OpMsg, go, going | ||
|
||
_HAVE_MOCKUPDB = True | ||
except ImportError: | ||
_HAVE_MOCKUPDB = False | ||
|
||
|
||
from bson.objectid import ObjectId | ||
|
||
pytestmark = pytest.mark.mockupdb | ||
|
||
|
||
class TestIdOrdering(PyMongoTestCase): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add a link to the crud spec that describes this test? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Once the spec is merged, yes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Was the spec merged? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. added! |
||
def test_id_ordering(self): | ||
server = MockupDB() | ||
server.autoresponds( | ||
"hello", | ||
isWritablePrimary=True, | ||
msg="isdbgrid", | ||
minWireVersion=0, | ||
maxWireVersion=25, | ||
helloOk=True, | ||
serviceId=ObjectId(), | ||
) | ||
server.run() | ||
self.addCleanup(server.stop) | ||
|
||
client = self.simple_client(server.uri, loadBalanced=True) | ||
collection = client.db.coll | ||
with going(collection.insert_one, {"x": 1}): | ||
request = server.receives() | ||
self.assertEqual("_id", next(iter(request["documents"][0]))) | ||
request.reply({"ok": 1}) | ||
|
||
with going(collection.bulk_write, [InsertOne({"x1": 1})]): | ||
request = server.receives() | ||
self.assertEqual("_id", next(iter(request["documents"][0]))) | ||
request.reply({"ok": 1}) | ||
|
||
with going(client.bulk_write, [InsertOne(namespace="db.coll", document={"x2": 1})]): | ||
request = server.receives() | ||
self.assertEqual("_id", next(iter(request["ops"][0]["document"]))) | ||
request.reply({"ok": 1}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation is also incomplete because it does not put the
_id
field first if the user supplies it. For example when inserting{"a": 1, "_id": 2}
. We should add tests for this case for insert/bulk/clientBulk as well.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've decided to make re-ordering user-supplied
_id
fields optional due to the complexity of doing so across different driver implementations. We can do it in PyMongo if we want, but it won't be standard across all drivers.