-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
Description
Documentation
defaultdict
seems to call __getitem__
whenever __setitem__
is called (regardless of if the item was already present), whereas regular dict
does not call __getitem__
when __setitem__
is called. The documentation for defaultdict
says that defaultdict
and dict
are basically identical, except in a few narrow cases.
defaultdict is a subclass of the built-in dict class. It overrides one method and adds one writable instance variable. The remaining functionality is the same as for the dict class and is not documented here.
But nothing is mentioned in the docs about this difference in behavior of calling __getitem__
/ __setitem__
This comes up when making a child class of either of them if you want to have a preprocessing step that operates on keys before they are used to index into the dictionary, e.g.
from collections import defaultdict
class Item: ...
def preprocess_item(item: Item) -> str:
return f'::{item.__class__.__name__}@{hex(id(item))}'
class PreprocessingDefaultDict(defaultdict):
def __getitem__(self, item: Item):
key = preprocess_item(item)
return super().__getitem__(key)
# def __setitem__(self, item: Item, value):
# key = preprocess_item(item)
# super().__setitem__(key, value)
class PreprocessingDict(dict):
def __getitem__(self, item: Item):
key = preprocess_item(item)
return super().__getitem__(key)
def __setitem__(self, key: Item, value):
key = preprocess_item(key)
super().__setitem__(key, value)
if __name__ == '__main__':
item = Item()
d1 = PreprocessingDefaultDict(dict)
d1[item]['a'] = 10 # initial creation of dict at d1[item]
d1[item]['a'] = 20 # updating already existing dict at d1[item]
print(dict(d1)) # wrap in dict so prints the same as d2
d2 = PreprocessingDict()
d2[item] = {}
d2[item]['a'] = 10 # initial creation of dict at d2[item]
d2[item]['a'] = 20 # updating already existing dict at d2[item]
print(d2)
Which prints out something like:
{'::Item@0x7fd29b035280': {'a': 20}}
{'::Item@0x7fd29b035280': {'a': 20}}
In this example, I have a preprocessor function I'd like to run on all keys to convert them from objects into strings which can be used in the dictionary. It is not clear from the docs that you need to not override __setitem__
like I have commented out, because defaultdict
will always call __getitem__
thus always running the preprocessor. If you override __setitem__
like I have commented out, you will preprocess the item twice, and end up with results like this:
{'::str@0x7fc55b78a930': {'a': 10}, '::str@0x7fc55b78a970': {'a': 20}}
{'::Item@0x7fc55b754890': {'a': 20}}
or this:
{'::str@0x7f3715686930': {'a': 20}}
{'::Item@0x7f3715650860': {'a': 20}}
(I believe the extra element happens because the string from preprocess_item
may or may not allocate new memory given an identical input)
I'm not exactly sure what the underlying cause of this difference is. It doesn't seem to be related to the __missing__
method mentioned in the docs, because the behavior I mentioned happens for keys that are not present in the defaultdict
as well as for those that are already present (and presumably wouldn't be calling __missing__
).
python version
I ran my example in python 3.6 through 3.12, and observed the same behavior in all of them
Metadata
Metadata
Assignees
Labels
Projects
Status