You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this blog, we will build a real-time product recommendation engine with LLM and graph database. In particular, we will use LLM to understand the category (taxonomy) of a product. In addition, we will use LLM to enumerate the complementary products - users are likely to buy together with the current product (pencil and notebook). We will use Graph to explore the relationships between products that can be further used for product recommendations or labeling.
20
-
19
+
We will build a real-time product recommendation engine with LLM and graph database. In particular, we will:
20
+
- Use LLM to understand the category (taxonomy) of a product.
21
+
- Use LLM to enumerate the complementary products - users are likely to buy together with the current product (pencil and notebook).
22
+
- Use Graph to explore the relationships between products that can be further used for product recommendations or labeling.
21
23
22
24
23
25
Product taxonomy is a way to organize product catalogs in a logical and hierarchical structure; a great detailed explanation can be found [here](https://help.shopify.com/en/manual/products/details/product-category). In practice, it is a complicated problem: a product can be part of multiple categories, and a category can have multiple parents.
@@ -26,15 +28,17 @@ Product taxonomy is a way to organize product catalogs in a logical and hierarch
26
28
## Prerequisites
27
29
*[Install PostgreSQL](https://cocoindex.io/docs/getting_started/installation#-install-postgres). CocoIndex uses PostgreSQL internally for incremental processing.
28
30
*[Install Neo4j](https://cocoindex.io/docs/ops/storages#Neo4j), a graph database.
29
-
*[Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai). Alternatively, you can switch to Ollama, which runs LLM models locally - [guide](https://cocoindex.io/docs/ai/llm#ollama).
31
+
*-[Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai). Create a `.env` file from `.env.example`, and fill `OPENAI_API_KEY`.
30
32
31
-
## Documentation
32
-
You can read the official CocoIndex Documentation for Property Graph Targets [here](https://cocoindex.io/docs/ops/storages#property-graph-targets).
33
+
Alternatively, we have native support for Gemini, Ollama, LiteLLM. You can choose your favorite LLM provider and work completely on-premises.
The core flow is about [~100 lines of python code](https://github.com/cocoindex-io/cocoindex/blob/1d42ab31692c73743425f7712c9af395ef98c80e/examples/product_taxonomy_knowledge_graph/main.py#L75-L177)
40
44
@@ -48,7 +52,7 @@ We are going to declare a data flow
48
52
4. export data to neo4j
49
53
50
54
51
-
###Add documents as source
55
+
## Add source
52
56
53
57
```python
54
58
@cocoindex.flow_def(name="StoreProduct")
@@ -64,7 +68,7 @@ Here `flow_builder.add_source` creates a [KTable](https://cocoindex.io/docs/core
64
68
65
69
66
70
67
-
###Add data collectors
71
+
## Add data collectors
68
72
69
73
Add collectors at the root scope to collect the product, taxonomy and complementary taxonomy.
We will parse the JSON file for each product, and transform the data to the format that we need for downstream processing.
80
84
81
-
####Data Mapping
85
+
### Data mapping
82
86
83
87
```python
84
88
@cocoindex.op.function(behavior_version=2)
@@ -98,8 +102,7 @@ Here we define a function for data mapping, e.g.,
98
102
- clean up the `price` field
99
103
- generate a markdown string for the product detail based on all the fields (for LLM to extract taxonomy and complementary taxonomy, we find that markdown works best as context for LLM).
100
104
101
-
102
-
#### Flow
105
+
### Process product JSON in the flow
103
106
104
107
Within the flow, we plug in the data mapping transformation to process each product JSON.
105
108
@@ -111,15 +114,25 @@ with data_scope["products"].row() as product:
Since we are using LLM to extract product taxonomy, we need to provide a detailed instruction at the class-level docstring.
125
138
@@ -140,7 +153,7 @@ class ProductTaxonomy:
140
153
name: str
141
154
```
142
155
143
-
####Define Product Taxonomy Info
156
+
### Define Product Taxonomy Info
144
157
145
158
Basically we want to extract all possible taxonomies for a product, and think about what other products are likely to be bought together with the current product.
146
159
@@ -162,7 +175,8 @@ class ProductTaxonomyInfo:
162
175
For each product, we want some insight about its taxonomy and complementary taxonomy and we could use that as bridge to find related product using knowledge graph.
163
176
164
177
165
-
#### LLM Extraction
178
+
179
+
### LLM Extraction
166
180
167
181
Finally, we will use `cocoindex.functions.ExtractByLlm` to extract the taxonomy and complementary taxonomy from the product detail.
For example, LLM takes the description of the *gel pen*, and extracts taxonomy to be *gel pen*.
179
194
Meanwhile, it suggests that when people buy *gel pen*, they may also be interested in *notebook* etc as complimentary taxonomy.
180
195
196
+

197
+
198
+
### Collect taxonomy and complementary taxonomy
181
199
182
200
And then we will collect the taxonomy and complementary taxonomy to the collector.
183
201
```python
@@ -188,15 +206,16 @@ with taxonomy['complementary_taxonomies'].row() as t:
188
206
```
189
207
190
208
191
-
###Build knowledge graph
209
+
## Build knowledge graph
192
210
193
-
####Basic concepts
211
+
### Basic concepts
194
212
All nodes for Neo4j need two things:
195
213
1. Label: The type of the node. E.g., `Product`, `Taxonomy`.
196
214
2. Primary key field: The field that uniquely identifies the node. E.g., `id` for `Product` nodes.
197
215
198
216
CocoIndex uses the primary key field to match the nodes and deduplicate them. If you have multiple nodes with the same primary key, CocoIndex keeps only one of them.
I used CocoInsight to troubleshoot the index generation and understand the data lineage of the pipeline. It is in free beta now, you can give it a try. Run following command to start CocoInsight:
392
+
393
+
```
394
+
cocoindex server -ci main.py
395
+
```
396
+
397
+
And then open the url `https://cocoindex.io/cocoinsight`. It just connects to your local CocoIndex server, with Zero pipeline data retention.
0 commit comments