Add HTML Page Title to Element Metadata in partition_html() #3970
                  
                    
                      prasannaJosium
                    
                  
                
                  started this conversation in
                Ideas
              
            Replies: 0 comments
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
Description:
Currently, when using
partition_html(), the metadata of elements doesn't include the HTML page title, which is a valuable piece of information that could be useful for many use cases. The title is available in the HTML document's<title>tag but isn't being extracted and included in the element metadata.Proposed Solution:
Add a
page_titlefield to theElementMetadataclass and modify thepartition_html()function to extract and include the page title in the metadata of each element. This would involve:page_title: Optional[str] = Noneto theElementMetadataclassFIRSTstrategyBenefits:
Example Usage:
Would you like me to submit a PR with these changes?
If there are other ways to get his done, please do educate.
Cheers
Beta Was this translation helpful? Give feedback.
All reactions