Skip to content

Commit ba3c733

Browse files
authored
Examples documentation - knowledge graphs, recommender, multi-format copali indexing (#894)
* upgrade docusaurus version * initial checkin * example documentation for custom targets * Update custom_targets.md * paper indexing * Update academic_papers_index.md * add example for knowledge graphs * add examples for photo search / knowledge graph * Create multi_format_index.md * Update multi_format_index.md * product recommendation example * Create manual_extraction.md * Create simple_text_embedding.md * Delete code_index.md * patient intake form * Create image_search.md
1 parent 472eb63 commit ba3c733

11 files changed

+2021
-228
lines changed

docs/docs/examples/examples/academic_papers_index.md

Lines changed: 0 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -23,32 +23,6 @@ to answer questions like "Give me all the papers by Jeff Dean."
2323

2424
4. If you want to perform full PDF embedding for the paper, you can extend the flow.
2525

26-
## Core Components
27-
28-
1. **PDF Preprocessing**
29-
- Reads PDFs using `pypdf` and extracts:
30-
- Total number of pages
31-
- First page content (used as a proxy for metadata-rich information)
32-
33-
2. **Markdown Conversion**
34-
- Converts the first page to Markdown using [Marker](https://github.com/datalab-to/marker).
35-
36-
3. **LLM-Powered Metadata Extraction**
37-
- Sends the first-page Markdown to GPT-4o using CocoIndex's `ExtractByLlm` function.
38-
- Extracted metadata includes:
39-
- `title` (string)
40-
- `authors` (with name, email, and affiliation)
41-
- `abstract` (string)
42-
43-
4. **Semantic Embedding**
44-
- The title is embedded directly using the `all-MiniLM-L6-v2` model by the SentenceTransformer.
45-
- Abstracts are chunked based on semantic punctuation and token count, then each chunk is embedded individually.
46-
47-
5. **Relational Data Collection**
48-
- Authors are unrolled and collected into an `author_papers` relation, enabling queries like:
49-
- Show all papers by X
50-
- Which co-authors worked with Y?
51-
5226
## Setup
5327

5428
- [Install PostgreSQL](https://cocoindex.io/docs/getting_started/installation#-install-postgres).

docs/docs/examples/examples/code_index.md

Lines changed: 0 additions & 199 deletions
This file was deleted.

0 commit comments

Comments
 (0)