Improper chunking for pdf #3803
              
                Unanswered
              
          
                  
                    
                      anshulgoyal43
                    
                  
                
                  asked this question in
                Q&A
              
            Replies: 1 comment
-
| 
         Did you try strategy="hi_res"?  | 
  
Beta Was this translation helpful? Give feedback.
                  
                    0 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to do pdf-chunking for my RAG
here is the code I ran
`from unstructured.partition.pdf import partition_pdf
file = "/Users/anshulgoyal/work/pdf_files/a1836-10.pdf"
print("Processing file:", file)
chunks = partition_pdf(filename=file, strategy="fast", chunking_strategy="basic")
for i in chunks:
print(i)
print("-"*100)
`
the link to pdf 'https://www.indiacode.nic.in/bitstream/123456789/18935/1/a1836-10.pdf'
The sentences are broken in middle in chunks, what am I missing?
Is this an issue with pdf or unstructured itself?
Beta Was this translation helpful? Give feedback.
All reactions