You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CocoIndex uses PostgreSQL internally for incremental processing.
46
+
-[Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai). Alternatively, we have native support for Gemini, Ollama, LiteLLM. You can choose your favorite LLM provider and work completely on-premises.
After this step, you should have the basic info of each paper.
102
+
After this step, we should have the basic info of each paper.
108
103
109
104
### Parse basic info
110
105
111
-
We will convert the first page to Markdown using Marker.
112
-
Alternatively, you can easily plug in your favorite PDF parser, such as Docling.
106
+
We will convert the first page to Markdown using Marker. Alternatively, you can easily plug in any PDF parser, such as Docling using CocoIndex's [custom function](https://cocoindex.io/docs/custom_ops/custom_functions).
113
107
114
108
Define a marker converter function and cache it, since its initialization is resource-intensive.
115
109
This ensures that the same converter instance is reused for different input files.

148
144
149
145
After this step, you should have the first page of each paper in Markdown format.
150
146
151
-
####Extract basic info with LLM
147
+
### Extract basic info with LLM
152
148
153
149
Define a schema for LLM extraction. CocoIndex natively supports LLM-structured extraction with complex and nested schemas.
154
-
If you are interested in learning more about nested schemas, refer to [this article](https://cocoindex.io/blogs/patient-intake-form-extraction-with-llm).
150
+
If you are interested in learning more about nested schemas, refer to [this example](https://cocoindex.io/docs/examples/patient_form_extraction).
155
151
156
152
```python
157
153
@dataclasses.dataclass
@@ -163,7 +159,6 @@ class PaperMetadata:
163
159
title: str
164
160
authors: list[Author]
165
161
abstract: str
166
-
167
162
```
168
163
169
164
Plug it into the `ExtractByLlm` function. With a dataclass defined, CocoIndex will automatically parse the LLM response into the dataclass.
@@ -292,7 +292,7 @@ with data_scope["documents"].row() as doc:
292
292
)
293
293
```
294
294
295
-
###Export
295
+
## Export
296
296
Finally, we export the data to Postgres.
297
297
298
298
```python
@@ -319,14 +319,9 @@ metadata_embeddings.export(
319
319
)
320
320
```
321
321
322
-
In this example we use PGVector as embedding stores/
323
-
With CocoIndex, you can do one line switch on other supported Vector databases like Qdrant, see this [guide](https://cocoindex.io/docs/ops/targets#entry-oriented-targets) for more details.
324
-
We aim to standardize interfaces and make it like assembling building blocks.
322
+
In this example we use PGVector as embedding store. With CocoIndex, you can do one line switch on other supported Vector databases.
325
323
326
-
## View in CocoInsight step by step
327
-
328
-
You can walk through the project step by step in [CocoInsight](https://www.youtube.com/watch?v=MMrpUfUcZPk) to see
329
-
exactly how each field is constructed and what happens behind the scenes.
@@ -338,3 +333,14 @@ For now CocoIndex doesn't provide additional query interface. We can write SQL o
338
333
- The query space has excellent solutions for querying, reranking, and other search-related functionality.
339
334
340
335
If you need assist with writing the query, please feel free to reach out to us at [Discord](https://discord.com/invite/zpA9S2DR7s).
336
+
337
+
## CocoInsight
338
+
339
+
You can walk through the project step by step in [CocoInsight](https://www.youtube.com/watch?v=MMrpUfUcZPk) to see exactly how each field is constructed and what happens behind the scenes.
340
+
341
+
342
+
```sh
343
+
cocoindex server -ci main.py
344
+
```
345
+
346
+
Follow the url `https://cocoindex.io/cocoinsight`. It connects to your local CocoIndex server, with zero pipeline data retention.
*[Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai). Alternatively, we have native support for Gemini, Ollama, LiteLLM. You can choose your favorite LLM provider and work completely on-premises.
0 commit comments