You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We appreciate a star ⭐ at [CocoIndex Github](https://github.com/cocoindex-io/cocoindex) if this is helpful.
5
+
6
+
This example shows how to use [BAML](https://boundaryml.com/) to extract structured data from patient intake PDFs. BAML provides type-safe structured data extraction with native PDF support.
7
+
8
+
-**BAML Schema** (`baml_src/patient.baml`) - Defines the data structure and extraction function
9
+
-**CocoIndex Flow** (`main.py`) - Wraps BAML in a custom function, provide the flow to and process files incrementally.
10
+
11
+
## Prerequisites
12
+
13
+
1.[Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one.
14
+
15
+
2. Install dependencies
16
+
17
+
```sh
18
+
pip install -U cocoindex baml-py
19
+
```
20
+
21
+
3.**Generate BAML client code** (required step!)
22
+
23
+
```sh
24
+
baml generate
25
+
```
26
+
27
+
This generates the `baml_client/` directory with Python code to call your BAML functions.
28
+
29
+
4. Create a `.env` file. You can copy it from `.env.example` first:
30
+
31
+
```sh
32
+
cp .env.example .env
33
+
```
34
+
35
+
Then edit the file to fill in your `GEMINI_API_KEY`.
36
+
37
+
## Run
38
+
39
+
Update index:
40
+
41
+
```sh
42
+
cocoindex update main
43
+
```
44
+
45
+
## CocoInsight
46
+
47
+
I used CocoInsight (Free beta now) to troubleshoot the index generation and understand the data lineage of the pipeline. It just connects to your local CocoIndex server, with zero pipeline data retention. Run following command to start CocoInsight:
48
+
49
+
```sh
50
+
cocoindex server -ci main
51
+
```
52
+
53
+
Then open the CocoInsight UI at [https://cocoindex.io/cocoinsight](https://cocoindex.io/cocoinsight).
0 commit comments