File tree Expand file tree Collapse file tree 3 files changed +8
-5
lines changed
Expand file tree Collapse file tree 3 files changed +8
-5
lines changed Original file line number Diff line number Diff line change @@ -70,8 +70,8 @@ def partition(
7070 include_page_breaks
7171 If True, the output will include page breaks if the filetype supports it
7272 strategy
73- The strategy to use for partitioning the PDF. Uses a layout detection model if set
74- to 'hi_res', otherwise partition_pdf simply extracts the text from the document
73+ The strategy to use for partitioning PDF/image . Uses a layout detection model if set
74+ to 'hi_res', otherwise partition simply extracts the text from the document
7575 and processes it.
7676 encoding
7777 The encoding method used to decode the text input. If None, utf-8 will be used.
Original file line number Diff line number Diff line change @@ -35,10 +35,12 @@ def partition_image(
3535 The languages to use for the Tesseract agent. To use a language, you'll first need
3636 to install the appropriate Tesseract language pack.
3737 strategy
38- The strategy to use for partitioning the PDF . Valid strategies are "hi_res" and
38+ The strategy to use for partitioning the image . Valid strategies are "hi_res" and
3939 "ocr_only". When using the "hi_res" strategy, the function uses a layout detection
4040 model if to identify document elements. When using the "ocr_only" strategy,
4141 partition_image simply extracts the text from the document using OCR and processes it.
42+ The default strategy `auto` will determine when a image can be extracted using `ocr_only` mode,
43+ otherwise it will fall back to `hi_res`.
4244 """
4345 exactly_one (filename = filename , file = file )
4446
Original file line number Diff line number Diff line change @@ -57,9 +57,10 @@ def partition_pdf(
5757 The strategy to use for partitioning the PDF. Valid strategies are "hi_res",
5858 "ocr_only", and "fast". When using the "hi_res" strategy, the function uses
5959 a layout detection model to identify document elements. When using the
60- "ocr_only" strategy, partition_image simply extracts the text from the
60+ "ocr_only" strategy, partition_pdf simply extracts the text from the
6161 document using OCR and processes it. If the "fast" strategy is used, the text
62- is extracted directly from the PDF.
62+ is extracted directly from the PDF. The default strategy `auto` will determine
63+ when a page can be extracted using `fast` mode, otherwise it will fall back to `hi_res`.
6364 infer_table_structure
6465 Only applicable if `strategy=hi_res`.
6566 If True, any Table elements that are extracted will also have a metadata field
You can’t perform that action at this time.
0 commit comments