- 
                Notifications
    
You must be signed in to change notification settings  - Fork 312
 
          Optimized traversal of schema tree for schema cleaning (GenerateSchema.clean_schema)
          #1487
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
          Codecov ReportAttention: Patch coverage is  
 
 Additional details and impacted files@@            Coverage Diff             @@
##             main    #1487      +/-   ##
==========================================
- Coverage   90.21%   89.63%   -0.58%     
==========================================
  Files         106      113       +7     
  Lines       16339    17984    +1645     
  Branches       36       40       +4     
==========================================
+ Hits        14740    16120    +1380     
- Misses       1592     1844     +252     
- Partials        7       20      +13     
 ... and 54 files with indirect coverage changes Continue to review full report in Codecov by Sentry. 
  | 
    
          CodSpeed Performance ReportMerging #1487 will not alter performanceComparing  Summary
 
 Benchmarks breakdown
  | 
    
18a29c4    to
    f2b4fc2      
    Compare
  
    | 
           please review  | 
    
GenerateSchema.clean_schema)
      0ea8bb3    to
    7eae801      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable to me. I haven't thought too much about the individual schema types, but I had a couple of possible optimization ideas :)
        
          
                src/schema_traverse.rs
              
                Outdated
          
        
      | meta_with_keys: Option<(Bound<'py, PyDict>, &'a Bound<'py, PySet>)>, | ||
| def_refs: Bound<'py, PyDict>, | ||
| recursive_def_refs: Bound<'py, PySet>, | ||
| recursively_seen_refs: HashSet<String>, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be an optimization to store these as Python strings to avoid round-trips:
| recursively_seen_refs: HashSet<String>, | |
| recursively_seen_refs: HashSet<PyBackedStr>, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering does it help here as the first contains check anyways converts it into rust string right away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to using PySet, it seems to be the fastest here. (Also faster than HashSet<PyBackedStr>)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Python strings might cache their hash, I don't think PyO3 uses that right now though. Interesting observation 👍
b4aed80    to
    14e3137      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I am concerned, this side looks good to me! Thanks 👍
1a3071e    to
    22aa6a2      
    Compare
  
    22aa6a2    to
    5c9c9e9      
    Compare
  
    This reverts commit 90418d9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good in principle, but it needs comprehensive docstrings.
| 
           Tackled in pydantic/pydantic#11244.  | 
    
Change Summary
Adds schema tree traversal which gathers necessary schema nodes and information for schema inlining and discriminator handling. Schema tree traversal is done in a single pass gathering the needed information. This is used in
GenerateSchema.clean_schemahandling. Required for PR pydantic/pydantic#10655 This makes schema cleaning much more efficient where the biggest bottleneck has been the Python side tree traversal. This especially with lots of models or deep models.Related issue number
See above Pydantic side PR.
Checklist
pydantic-core(except for expected changes)Selected Reviewer: @sydney-runkle