Skip to content

Conversation

ddrondo
Copy link

@ddrondo ddrondo commented Sep 19, 2025

If the field is embedded and empty, it gave an error about the absence of this field.

@timgraham
Copy link
Collaborator

Please give complete steps to reproduce the issue, including the traceback. Logically, it seems incorrect that addingblank=True to a field should disable its converters. Perhaps if a converter crashes on a blank value, that converter needs to be fixed.

@ddrondo
Copy link
Author

ddrondo commented Sep 19, 2025

 "route_fares": {
    "tent": {
    },
  },

Not all documents contain a fixed format, sometimes there is no need to specify a field. Django does not know how to support this, because it was created for relational databases. This implementation can partially eliminate the need for this. I agree that blank is not suitable for this idea. Perhaps you need to come up with another parameter for this solution, for example, void=True

@timgraham
Copy link
Collaborator

What does your model look like?

@ddrondo
Copy link
Author

ddrondo commented Sep 19, 2025

class RouteFares(EmbeddedModel):
    tent = EmbeddedModelField(RouteFareSize, null=True, blank=True)
    wagon = EmbeddedModelField(RouteFareSize, null=True, blank=True)
    open = EmbeddedModelField(RouteFareSize, null=True, blank=True)
    ref = EmbeddedModelField(RouteFareSize, null=True, blank=True)
    isotherm = EmbeddedModelField(RouteFareSize, null=True, blank=True)


class Tariff(models.Model):
    ...
    route_fares = EmbeddedModelField(RouteFares)
    

@timgraham
Copy link
Collaborator

Not all documents contain a fixed format, sometimes there is no need to specify a field. Django does not know how to support this, because it was created for relational databases.

Got it. We've been considering whether or not to try to support this sort of sparse data that wasn't written by Django. We've identified some other problems in passing (#275, #401) but it's unclear that supporting this use case is worth it. Writing rigorous tests for sparse data would be a large effort.

@timgraham timgraham changed the title Update operations.py Fix EmbeddedModelField crash when retrieving model where field name isn't present in db data Sep 20, 2025
@ddrondo
Copy link
Author

ddrondo commented Sep 21, 2025

Definitely needed. Mongo is non-relational database for a reason

@timgraham
Copy link
Collaborator

I'm guessing the exception is similar to this:

>>> Tariff.objects.all()
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/tim/code/django/django/db/models/query.py", line 360, in __repr__
    data = list(self[: REPR_OUTPUT_SIZE + 1])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tim/code/django/django/db/models/query.py", line 384, in __iter__
    self._fetch_all()
  File "/home/tim/code/django/django/db/models/query.py", line 1949, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tim/code/django/django/db/models/query.py", line 123, in __iter__
    for row in compiler.results_iter(results):
  File "/home/tim/code/django/django/db/models/sql/compiler.py", line 1541, in apply_converters
    value = converter(value, expression, connection)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tim/code/django-mongodb/django_mongodb_backend/operations.py", line 192, in convert_embeddedmodelfield_value
    value[field.attname] = converter(value[field.attname], field_expr, connection)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tim/code/django-mongodb/django_mongodb_backend/operations.py", line 192, in convert_embeddedmodelfield_value
    value[field.attname] = converter(value[field.attname], field_expr, connection)
                                     ~~~~~^^^^^^^^^^^^^^^
KeyError: 'num'

(where "num" is a field of RouteFareSize).

Thus a suitable patch might be:

diff --git a/django_mongodb_backend/operations.py b/django_mongodb_backend/operations.py
index 4b494c3..6d4f033 100644
--- a/django_mongodb_backend/operations.py
+++ b/django_mongodb_backend/operations.py
@@ -189,7 +189,8 @@ class DatabaseOperations(GISOperations, BaseDatabaseOperations):
                     field_expr
                 ) + field_expr.get_db_converters(connection)
                 for converter in converters:
-                    value[field.attname] = converter(value[field.attname], field_expr, connection)
+                    if field.attname in value:
+                        value[field.attname] = converter(value[field.attname], field_expr, connection)
         return value
 
     def convert_jsonfield_value(self, value, expression, connection):

I gather that you're trying to build a Django application with some data that Django didn't write. Is it otherwise going okay?

@ddrondo
Copy link
Author

ddrondo commented Sep 24, 2025

That's right, it's a type of error. In principle, this option can also work, but I would move the condition outside the loop, as there will be unnecessary empty iterations in the loop. However, I don't think this is a good solution, as not everyone expects this behavior, and many follow a strict pattern, which would prevent the error.

@timgraham
Copy link
Collaborator

All the data that Django writes expects converters to run. If converters don't run, field data will be in an unexpected time (e.g. DateField values will be datetime rather than date, DecimalField will be Decimal128 rather than Decimal). This is why your proposed solution of checking blank is incorrect: converters won't run for fields with blank=True, even if the field has data.

@ddrondo
Copy link
Author

ddrondo commented Sep 24, 2025

I agree with blank=True that it is incorrect to specify it, but your decision is also incorrect. I think the best solution would be to inherit from 'from django.db.models.fields import Field' and add a new attribute to solve the problem

@timgraham
Copy link
Collaborator

As I explained, by disabling an embedded model field's converters, you're going to break those fields. Instead, for example, if you want a version of DecimalField that uses Decimal128 instead of Decimal, you'll instead want to write a custom field (Decimal128Field) and use that on your embedded model.

Nonetheless, you can implement your own custom embedded model field if you believe your proposed solution is suitable for your project.

@ddrondo
Copy link
Author

ddrondo commented Sep 24, 2025

I'm not talking about converters, I'm talking about:

if field.attname in value:
      value[field.attname] = converter(value[field.attname], field_expr, connection)

Not everyone expects this behavior, and many follow a strict schema of the model.

@timgraham
Copy link
Collaborator

Please check if #409 solves your issue.

If not, you'll need to explain a bit more. I'm not sure what a "strict schema of the model" means.

The lines you quoted is where database converters are run on each embedded model field. These cannot be disabled if a field has any data, otherwise the data won't be converted as required (e.g. an EmbeddedModel's DecimalField value would be loaded as Decimal128 instead of Decimal).

@ddrondo
Copy link
Author

ddrondo commented Sep 25, 2025

This solution is suitable for me, BUT:
Strict schema as in relational databases. For example:
{ a: [], b: [], }
Let's say it's important for a person that the document contains both 'a' and 'b' fields. However, if we take your proposed solution and a document from the database that only contains the 'a' field and lacks the 'b' field, there may be an error in the backend when working with the document, as the logic is not entirely clear.
Either a different approach is needed, or the documentation should include information about this feature. While this logic is clear to me, as I have encountered this issue, it may not be as obvious to others.

@timgraham
Copy link
Collaborator

The idea of the fix is to avoid running converters on fields that aren't in the data.

I'm not sure why this is unclear or why you think this approach may result in some other error.

@ddrondo
Copy link
Author

ddrondo commented Sep 25, 2025

The idea of the fix is to avoid running converters on fields that aren't in the data.

I'm not sure why this is unclear or why you think this approach may result in some other error.

I'm not talking about converters.

@Jibola
Copy link
Contributor

Jibola commented Sep 30, 2025

Hey, @ddrondo thanks for taking the time to point out this issue. I want to specifically address your concern here:

Let's say it's important for a person that the document contains both 'a' and 'b' fields. However, if we take your proposed solution and a document from the database that only contains the 'a' field and lacks the 'b' field, there may be an error in the backend when working with the document, as the logic is not entirely clear.

I don't think this will be an error folks run into as there's two primary ways for data to appear in the database. Let's walk through the cases:
1. Data provided by the ORM
In this case, the ORM maintains the invariant that all data is present. For nested embedded documents, if someone has allowed their ORM to leave a subfield empty, then it is not a violation of code expectations, as they are returning a NoneType and there's nothing more can do except try not to convert blank values. As well, the check for a field name being within a key, is still an O(1) lookup operation, so while it does incur a cost, it's minimal given the tradeoff is correctness and fault tolerance.

If someone needs to maintain rigid schema, they should be enforcing & providing default values through the ORM.

2. Data provided externally and used by the ORM
This is the case you've encountered. Information can easily differ from the ORM as MongoDB does not require strict schema. The solution @timgraham provided fixes that issue to ensure we don't crash. Beyond that, the user should check for NoneType.

In both of these cases, the fix @timgraham provided does not introduce a new or unexpected regression. If anything, it maintains that if a field returns as a NoneType from the database it will not crash.

Below is an example of what happens when a field is not present.

>>> Diary.objects.create(title="a", isbn="b", sub_genre=Horror(name="twinkle"))
<Diary: a>
# I went into the database and deleted `sub_genre`
>>> a = Diary.objects.first()
>>> a.sub_genre
>>> a.sub_genre.name
Traceback (most recent call last):
  File "<console>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'name'
>>> a.sub_genre = Horror(name="twinkle")
>>> a.save()
>>> a
<Diary: a>
>>> a.sub_genre.name
'twinkle'
# Now I delete sub_genre.name directly from the db
>>> Diary.objects.first().sub_genre.name
''

See above that we no longer crash on missing key retrieval. The only time an error would occur is if iterating on a NoneType which can/should only be achievable by some manual database mutation or explicit NoneType allowance.

@timgraham
Copy link
Collaborator

(The conversation can continue if need be, but the problem should addressed by #409.)

@timgraham timgraham closed this Sep 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants