Skip to content

Commit cc0e41f

Browse files
authored
docs: product recommendation example (#922)
1 parent 253d512 commit cc0e41f

File tree

12 files changed

+89
-66
lines changed

12 files changed

+89
-66
lines changed

docs/docs/examples/examples/product_recommendation.md

Lines changed: 87 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,16 @@ sidebar_custom_props:
1010
tags: [knowledge-graph]
1111
---
1212

13-
import { GitHubButton, YouTubeButton } from '../../../src/components/GitHubButton';
13+
import { GitHubButton, YouTubeButton, DocumentationButton } from '../../../src/components/GitHubButton';
1414

1515
<GitHubButton url="https://github.com/cocoindex-io/cocoindex/tree/main/examples/product_recommendation"/>
1616

1717
## Overview
1818

19-
In this blog, we will build a real-time product recommendation engine with LLM and graph database. In particular, we will use LLM to understand the category (taxonomy) of a product. In addition, we will use LLM to enumerate the complementary products - users are likely to buy together with the current product (pencil and notebook). We will use Graph to explore the relationships between products that can be further used for product recommendations or labeling.
20-
19+
We will build a real-time product recommendation engine with LLM and graph database. In particular, we will:
20+
- Use LLM to understand the category (taxonomy) of a product.
21+
- Use LLM to enumerate the complementary products - users are likely to buy together with the current product (pencil and notebook).
22+
- Use Graph to explore the relationships between products that can be further used for product recommendations or labeling.
2123

2224

2325
Product taxonomy is a way to organize product catalogs in a logical and hierarchical structure; a great detailed explanation can be found [here](https://help.shopify.com/en/manual/products/details/product-category). In practice, it is a complicated problem: a product can be part of multiple categories, and a category can have multiple parents.
@@ -26,15 +28,17 @@ Product taxonomy is a way to organize product catalogs in a logical and hierarch
2628
## Prerequisites
2729
* [Install PostgreSQL](https://cocoindex.io/docs/getting_started/installation#-install-postgres). CocoIndex uses PostgreSQL internally for incremental processing.
2830
* [Install Neo4j](https://cocoindex.io/docs/ops/storages#Neo4j), a graph database.
29-
* [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai). Alternatively, you can switch to Ollama, which runs LLM models locally - [guide](https://cocoindex.io/docs/ai/llm#ollama).
31+
* - [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai). Create a `.env` file from `.env.example`, and fill `OPENAI_API_KEY`.
3032

31-
## Documentation
32-
You can read the official CocoIndex Documentation for Property Graph Targets [here](https://cocoindex.io/docs/ops/storages#property-graph-targets).
33+
Alternatively, we have native support for Gemini, Ollama, LiteLLM. You can choose your favorite LLM provider and work completely on-premises.
3334

35+
<DocumentationButton href="https://cocoindex.io/docs/ai/llm" text="LLM" margin="0 0 16px 0" />
3436

35-
## Data flow to build knowledge graph
3637

37-
### Overview
38+
## Documentation
39+
<DocumentationButton href="https://cocoindex.io/docs/ops/targets#property-graph-targets" text="Property Graph Targets" margin="0 0 16px 0" />
40+
41+
## Flow Overview
3842

3943
The core flow is about [~100 lines of python code](https://github.com/cocoindex-io/cocoindex/blob/1d42ab31692c73743425f7712c9af395ef98c80e/examples/product_taxonomy_knowledge_graph/main.py#L75-L177)
4044

@@ -48,7 +52,7 @@ We are going to declare a data flow
4852
4. export data to neo4j
4953

5054

51-
### Add documents as source
55+
## Add source
5256

5357
```python
5458
@cocoindex.flow_def(name="StoreProduct")
@@ -64,7 +68,7 @@ Here `flow_builder.add_source` creates a [KTable](https://cocoindex.io/docs/core
6468

6569

6670

67-
### Add data collectors
71+
## Add data collectors
6872

6973
Add collectors at the root scope to collect the product, taxonomy and complementary taxonomy.
7074

@@ -74,11 +78,11 @@ product_taxonomy = data_scope.add_collector()
7478
product_complementary_taxonomy = data_scope.add_collector()
7579
```
7680

77-
### Process each product
81+
## Process each product
7882

7983
We will parse the JSON file for each product, and transform the data to the format that we need for downstream processing.
8084

81-
#### Data Mapping
85+
### Data mapping
8286

8387
```python
8488
@cocoindex.op.function(behavior_version=2)
@@ -98,8 +102,7 @@ Here we define a function for data mapping, e.g.,
98102
- clean up the `price` field
99103
- generate a markdown string for the product detail based on all the fields (for LLM to extract taxonomy and complementary taxonomy, we find that markdown works best as context for LLM).
100104

101-
102-
#### Flow
105+
### Process product JSON in the flow
103106

104107
Within the flow, we plug in the data mapping transformation to process each product JSON.
105108

@@ -111,15 +114,25 @@ with data_scope["products"].row() as product:
111114
product_node.collect(id=data["id"], url=data["url"], title=data["title"], price=data["price"])
112115
```
113116

117+
It performs the following transformations:
118+
114119
1. The first `transform()` parses the JSON file.
115-
2. The second `transform()` performs the defined data mapping.
120+
121+
<DocumentationButton href="https://cocoindex.io/docs/ops/functions#parsejson" text="ParseJson" margin="0 0 16px 0" />
122+
![ParseJson](/img/examples/product_recommendation/parse_json.png)
123+
124+
2. The second `transform()` performs the defined data mapping.
125+
![Extract product info and data mapping](/img/examples/product_recommendation/extract_product.png)
126+
116127
3. We collect the fields we need for the product node in Neo4j.
117128

118129

119130

120-
### Extract taxonomy and complementary taxonomy using LLM
131+
## Extract taxonomy and complementary taxonomy
121132

122-
#### Product Taxonomy Definition
133+
![Product Taxonomy Info](/img/examples/product_recommendation/taxonomy.png)
134+
135+
### Product Taxonomy Definition
123136

124137
Since we are using LLM to extract product taxonomy, we need to provide a detailed instruction at the class-level docstring.
125138

@@ -140,7 +153,7 @@ class ProductTaxonomy:
140153
name: str
141154
```
142155

143-
#### Define Product Taxonomy Info
156+
### Define Product Taxonomy Info
144157

145158
Basically we want to extract all possible taxonomies for a product, and think about what other products are likely to be bought together with the current product.
146159

@@ -162,7 +175,8 @@ class ProductTaxonomyInfo:
162175
For each product, we want some insight about its taxonomy and complementary taxonomy and we could use that as bridge to find related product using knowledge graph.
163176

164177

165-
#### LLM Extraction
178+
179+
### LLM Extraction
166180

167181
Finally, we will use `cocoindex.functions.ExtractByLlm` to extract the taxonomy and complementary taxonomy from the product detail.
168182

@@ -173,11 +187,15 @@ taxonomy = data["detail"].transform(cocoindex.functions.ExtractByLlm(
173187
output_type=ProductTaxonomyInfo))
174188
```
175189

190+
<DocumentationButton href="https://cocoindex.io/docs/ops/functions#extractbyllm" text="ExtractByLlm" margin="0 0 16px 0" />
176191

177192

178193
For example, LLM takes the description of the *gel pen*, and extracts taxonomy to be *gel pen*.
179194
Meanwhile, it suggests that when people buy *gel pen*, they may also be interested in *notebook* etc as complimentary taxonomy.
180195

196+
![Extract taxonomy and complementary taxonomy](/img/examples/product_recommendation/extract_taxonomy.png)
197+
198+
### Collect taxonomy and complementary taxonomy
181199

182200
And then we will collect the taxonomy and complementary taxonomy to the collector.
183201
```python
@@ -188,15 +206,16 @@ with taxonomy['complementary_taxonomies'].row() as t:
188206
```
189207

190208

191-
### Build knowledge graph
209+
## Build knowledge graph
192210

193-
#### Basic concepts
211+
### Basic concepts
194212
All nodes for Neo4j need two things:
195213
1. Label: The type of the node. E.g., `Product`, `Taxonomy`.
196214
2. Primary key field: The field that uniquely identifies the node. E.g., `id` for `Product` nodes.
197215

198216
CocoIndex uses the primary key field to match the nodes and deduplicate them. If you have multiple nodes with the same primary key, CocoIndex keeps only one of them.
199217

218+
![Deduplication](/img/examples/product_recommendation/dedupe.png)
200219

201220
There are two ways to map nodes:
202221
1. When you have a collector just for the node, you can directly export it to Neo4j. For example `Product`. We've collected each product explicitly.
@@ -211,7 +230,7 @@ product_taxonomy.collect(id=cocoindex.GeneratedField.UUID, product_id=data["id"]
211230
Collects a relationship, and taxonomy node is created from the relationship.
212231

213232

214-
#### Configure Neo4j connection:
233+
### Configure Neo4j connection
215234

216235
```python
217236
conn_spec = cocoindex.add_auth_entry(
@@ -223,7 +242,7 @@ conn_spec = cocoindex.add_auth_entry(
223242
))
224243
```
225244

226-
#### Export `Product` nodes to Neo4j
245+
### Export `Product` nodes to Neo4j
227246

228247
```python
229248
product_node.export(
@@ -235,13 +254,15 @@ product_node.export(
235254
primary_key_fields=["id"],
236255
)
237256
```
257+
![Export Product](/img/examples/product_recommendation/export_product.png)
258+
238259

239260
This exports Neo4j nodes with label `Product` from the `product_node` collector.
240261
- It declares Neo4j node label `Product`. It specifies `id` as the primary key field.
241262
- It carries all the fields from `product_node` collector to Neo4j nodes with label `Product`.
242263

243264

244-
#### Export `Taxonomy` nodes to Neo4j
265+
### Export `Taxonomy` nodes to Neo4j
245266

246267
We don't have explicit collector for `Taxonomy` nodes.
247268
They are part of the `product_taxonomy` and `product_complementary_taxonomy` collectors and fields are collected during the taxonomy extraction.
@@ -258,6 +279,7 @@ flow_builder.declare(
258279
)
259280
```
260281

282+
261283
Next, export the `product_taxonomy` as relationship to Neo4j.
262284

263285
```python
@@ -287,38 +309,38 @@ product_taxonomy.export(
287309
)
288310
```
289311

312+
![Export Taxonomy](/img/examples/product_recommendation/export_taxonomy.png)
313+
290314

291315

292316
Similarly, we can export the `product_complementary_taxonomy` as relationship to Neo4j.
293317
```python
294-
product_complementary_taxonomy.export(
295-
"product_complementary_taxonomy",
296-
cocoindex.storages.Neo4j(
297-
connection=conn_spec,
298-
mapping=cocoindex.storages.Relationships(
299-
rel_type="PRODUCT_COMPLEMENTARY_TAXONOMY",
300-
source=cocoindex.storages.NodeFromFields(
301-
label="Product",
302-
fields=[
303-
cocoindex.storages.TargetFieldMapping(
304-
source="product_id", target="id"),
305-
]
306-
),
307-
target=cocoindex.storages.NodeFromFields(
308-
label="Taxonomy",
309-
fields=[
310-
cocoindex.storages.TargetFieldMapping(
311-
source="taxonomy", target="value"),
312-
]
313-
),
318+
product_complementary_taxonomy.export(
319+
"product_complementary_taxonomy",
320+
cocoindex.storages.Neo4j(
321+
connection=conn_spec,
322+
mapping=cocoindex.storages.Relationships(
323+
rel_type="PRODUCT_COMPLEMENTARY_TAXONOMY",
324+
source=cocoindex.storages.NodeFromFields(
325+
label="Product",
326+
fields=[
327+
cocoindex.storages.TargetFieldMapping(
328+
source="product_id", target="id"),
329+
]
330+
),
331+
target=cocoindex.storages.NodeFromFields(
332+
label="Taxonomy",
333+
fields=[
334+
cocoindex.storages.TargetFieldMapping(
335+
source="taxonomy", target="value"),
336+
]
314337
),
315338
),
316-
primary_key_fields=["id"],
317-
)
339+
),
340+
primary_key_fields=["id"],
341+
)
318342
```
319-
320-
321-
343+
![Export Complementary Taxonomy](/img/examples/product_recommendation/export_all.png)
322344

323345
The `cocoindex.storages.Relationships` declares how to map relationships in Neo4j.
324346

@@ -330,9 +352,7 @@ Note that different relationships may share the same source and target nodes.
330352
`NodeFromFields` takes the fields from the `entity_relationship` collector and creates `Taxonomy` nodes.
331353

332354

333-
## Query and test your index
334-
🎉 Now you are all set!
335-
355+
## Run the flow
336356
1. Install the dependencies:
337357

338358
```
@@ -350,28 +370,29 @@ Note that different relationships may share the same source and target nodes.
350370
documents: 9 added, 0 removed, 0 updated
351371
```
352372
353-
3. (Optional) I used CocoInsight to troubleshoot the index generation and understand the data lineage of the pipeline.
354-
It is in free beta now, you can give it a try. Run following command to start CocoInsight:
355-
356-
```
357-
cocoindex server -ci main.py
358-
```
359-
360-
And then open the url https://cocoindex.io/cocoinsight. It just connects to your local CocoIndex server, with Zero pipeline data retention.
361-
362-
363-
364-
365-
### Browse the knowledge graph
373+
## Browse the knowledge graph
366374
After the knowledge graph is built, you can explore the knowledge graph you built in Neo4j Browser.
367375
368376
For the dev environment, you can connect to Neo4j browser using credentials:
369377
- username: `Neo4j`
370378
- password: `cocoindex`
379+
371380
which is pre-configured in our docker compose [config.yaml](https://raw.githubusercontent.com/cocoindex-io/cocoindex/refs/heads/main/dev/Neo4j.yaml).
372381
373382
You can open it at [http://localhost:7474](http://localhost:7474), and run the following Cypher query to get all relationships:
374383
375384
```cypher
376385
MATCH p=()-->() RETURN p
377-
```
386+
```
387+
388+
![Neo4j Browser](/img/examples/product_recommendation/neo4j.png)
389+
390+
## CocoInsight
391+
I used CocoInsight to troubleshoot the index generation and understand the data lineage of the pipeline. It is in free beta now, you can give it a try. Run following command to start CocoInsight:
392+
393+
```
394+
cocoindex server -ci main.py
395+
```
396+
397+
And then open the url `https://cocoindex.io/cocoinsight`. It just connects to your local CocoIndex server, with Zero pipeline data retention.
398+
-112 KB
Loading
23.9 KB
Loading
17.5 KB
Loading
11.4 KB
Loading
15.1 KB
Loading
72.6 KB
Loading
106 KB
Loading
161 KB
Loading
53.4 KB
Loading

0 commit comments

Comments
 (0)