Skip to content

Commit 37a34fa

Browse files
committed
add pdf marker
1 parent 91d8166 commit 37a34fa

File tree

2 files changed

+4
-2
lines changed

2 files changed

+4
-2
lines changed

bcorag/conf.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@
44
"loader": {
55
"list": [
66
"SimpleDirectoryReader",
7-
"PDFReader"
7+
"PDFReader",
8+
"PDFMarker"
89
],
910
"default": "SimpleDirectoryReader",
1011
"documentation": "https://github.com/biocompute-objects/bco-rag/blob/main/docs/usage.md#data-loader"

docs/usage.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,8 @@ Aside from the built in generic data loaders, LLamaIndex hosts an open source [h
4848
The currently supported data loaders are:
4949

5050
- `SimpleDirectoryReader` (default): This is a built-in data loader provided directly by the LlamaIndex library. It is the most generic option and is not specialized in any specific file type.
51-
- `PDFReader`: This is an external data loader from LlamaHub that is specialized to PDF files.
51+
- `PDFReader`: The PDF reader is the generic PDF reader, offering some slight optimization over the more naive simple directory reader.
52+
- `PDFMarker`: The PDF marker converts the PDF file to clean markdown before ingesting.
5253

5354
### Chunking Strategy
5455

0 commit comments

Comments
 (0)