- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 33.2k
gh-95382: Use cache for indentations in the JSON encoder #118636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-95382: Use cache for indentations in the JSON encoder #118636
Conversation
8e82b42    to
    2e915fd      
    Compare
  
    Co-authored-by: Pieter Eendebak <[email protected]>
| if (self->indent != Py_None) { | ||
| newline_indent = _create_newline_indent(self->indent, indent_level); | ||
| if (newline_indent == NULL) { | ||
| indent_cache = create_indent_cache(self, indent_level); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When called via the public interface indent_level is always zero. By removing the argument indent_level in the internal implementation this call and some other bits of the code can be a bit simplified.
(this is probably not worth creating a separate PR for, I leave it up to you)
| write_newline_indent(_PyUnicodeWriter *writer, | ||
| Py_ssize_t indent_level, PyObject *indent_cache) | ||
| { | ||
| PyObject *newline_indent = PyList_GET_ITEM(indent_cache, indent_level * 2); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| PyObject *newline_indent = PyList_GET_ITEM(indent_cache, indent_level * 2); | |
| assert(indent_level * 2 < PyList_GET_SIZE(indent_cache)); | |
| PyObject *newline_indent = PyList_GET_ITEM(indent_cache, indent_level * 2); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assert above is placed after conditional resizing the cache. It is useful to check that we always get the right cache size, independently from conditions. But here we do not change the cache.
| return NULL; | ||
| } | ||
| } | ||
| assert(indent_level * 2 < PyList_GET_SIZE(indent_cache)); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| assert(indent_level * 2 < PyList_GET_SIZE(indent_cache)); | |
| assert(indent_level * 2 - 1 < PyList_GET_SIZE(indent_cache)); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent_level * 2 < PyList_GET_SIZE(indent_cache) is more strong condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR adds some extra code for creation and usage of the cache, but this simplifies the error handling in the methods using the cache. The performance improvement is a bit larger than I expected which is nice, although it is also fair to say that the performance when writing json files with indent != None is probably not the most important.
It is a continuation of #118105 which keeps a cache of newline+indentation and comma+newline+indentation at every level.
For example:
The effect is the strongest if there are many deep narrow trees growing from the root:
And the weakest (up to no difference) if the tree contains few large flat lists or dicts.