Skip to content

Commit 1749004

Browse files
authored
Update README.md for manual_extraction. (#115)
1 parent b161955 commit 1749004

File tree

3 files changed

+26
-6
lines changed

3 files changed

+26
-6
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ Go to the [examples directory](examples) to try out with any of the examples, fo
9595
| [Text Embedding](examples/text_embedding) | Index text documents with embeddings for semantic search |
9696
| [Code Embedding](examples/code_embedding) | Index code embeddings for semantic search |
9797
| [PDF Embedding](examples/pdf_embedding) | Parse PDF and index text embeddings for semantic search |
98+
| [Manual Extraction](examples/manual_extraction) | Extract structured information from a manual using LLM |
9899

99100
More coming and stay tuned! If there's any specific examples you would like to see, please let us know in our [Discord community](https://discord.com/invite/zpA9S2DR7s) 🌱.
100101

examples/manual_extraction/README.md

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,21 @@
1-
Simple example for cocoindex: extract structured information from a Markdown file.
1+
In this example, we
2+
3+
* Converts PDFs (generated from a few Python docs) into Markdown.
4+
* Extract structured information from the Markdown using LLM.
5+
* Use a custom function to further extract information from the structured output.
26

37
## Prerequisite
4-
[Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one.
8+
9+
Before running the example, you need to:
10+
11+
* [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one.
12+
* Install / configure LLM API. In this example we use Ollama, which runs LLM model locally. You need to get it ready following [this guide](https://cocoindex.io/docs/ai/llm#ollama). Alternatively, you can also follow the comments in source code to switch to OpenAI, and [configure OpenAI API key](https://cocoindex.io/docs/ai/llm#openai) before running the example.
513

614
## Run
715

16+
17+
### Build the index
18+
819
Install dependencies:
920

1021
```bash
@@ -23,10 +34,18 @@ Update index:
2334
python manual_extraction.py cocoindex update
2435
```
2536

26-
Run:
37+
### Query the index
38+
39+
After index is build, you have a table with name `modules_info`. You can query it any time, e.g. start a Postgres shell:
2740

2841
```bash
29-
python manual_extraction.py
42+
psql postgres://cocoindex:cocoindex@localhost/cocoindex
43+
```
44+
45+
And run the SQL query:
46+
47+
```sql
48+
SELECT filename, module_info->'title' AS title, module_summary FROM modules_info;
3049
```
3150

3251
## CocoInsight
@@ -35,5 +54,5 @@ CocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute vi
3554
Run CocoInsight to understand your RAG data pipeline:
3655

3756
```
38-
python manual_extraction.py cocoindex server -c https://cocoindex.io/cocoinsight
57+
python manual_extraction.py cocoindex server -c https://cocoindex.io
3958
```

examples/manual_extraction/manual_extraction.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ def manual_extraction_flow(flow_builder: cocoindex.FlowBuilder, data_scope: coco
112112

113113
modules_index.export(
114114
"modules",
115-
cocoindex.storages.Postgres(),
115+
cocoindex.storages.Postgres(table_name="modules_info"),
116116
primary_key_fields=["filename"],
117117
)
118118

0 commit comments

Comments
 (0)