Skip to content

Commit 4745dda

Browse files
committed
Update knowledge extraction documentation
1 parent f7d7f96 commit 4745dda

File tree

11 files changed

+42
-30
lines changed

11 files changed

+42
-30
lines changed

coverage-badge.svg

Lines changed: 1 addition & 1 deletion
Loading

docs/Code Examples/Advanced/Knowledge Augmented Generation.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ if __name__ == "__main__":
107107
"name": "Vatican City",
108108
"label": "Country"
109109
},
110-
"combined_score": 0.8975120754014027
110+
"score": 0.8975120754014027
111111
},
112112
{
113113
"subj": {
@@ -119,7 +119,7 @@ if __name__ == "__main__":
119119
"name": "Vatican City",
120120
"label": "Country"
121121
},
122-
"combined_score": 0.8975120754014027
122+
"score": 0.8975120754014027
123123
},
124124
{
125125
"subj": {
@@ -131,7 +131,7 @@ if __name__ == "__main__":
131131
"name": "United Kingdom",
132132
"label": "Country"
133133
},
134-
"combined_score": 0.9030690106522803
134+
"score": 0.9030690106522803
135135
},
136136
{
137137
"subj": {
@@ -143,7 +143,7 @@ if __name__ == "__main__":
143143
"name": "United Kingdom",
144144
"label": "Country"
145145
},
146-
"combined_score": 0.9030690106522803
146+
"score": 0.9030690106522803
147147
},
148148
{
149149
"subj": {
@@ -155,7 +155,7 @@ if __name__ == "__main__":
155155
"name": "Italy",
156156
"label": "Country"
157157
},
158-
"combined_score": 0.9245654142023485
158+
"score": 0.9245654142023485
159159
},
160160
{
161161
"subj": {
@@ -167,7 +167,7 @@ if __name__ == "__main__":
167167
"name": "Germany",
168168
"label": "Country"
169169
},
170-
"combined_score": 0.9306481791338741
170+
"score": 0.9306481791338741
171171
},
172172
{
173173
"subj": {
@@ -179,7 +179,7 @@ if __name__ == "__main__":
179179
"name": "Europe",
180180
"label": "Country"
181181
},
182-
"combined_score": 0.941624026613126
182+
"score": 0.941624026613126
183183
},
184184
{
185185
"subj": {
@@ -191,7 +191,7 @@ if __name__ == "__main__":
191191
"name": "Europe",
192192
"label": "Country"
193193
},
194-
"combined_score": 0.941624026613126
194+
"score": 0.941624026613126
195195
},
196196
{
197197
"subj": {
@@ -203,7 +203,7 @@ if __name__ == "__main__":
203203
"name": "Europe",
204204
"label": "Country"
205205
},
206-
"combined_score": 0.941624026613126
206+
"score": 0.941624026613126
207207
},
208208
{
209209
"subj": {
@@ -215,7 +215,7 @@ if __name__ == "__main__":
215215
"name": "France",
216216
"label": "Country"
217217
},
218-
"combined_score": 0.9998149700645786
218+
"score": 0.9998149700645786
219219
}
220220
],
221221
"answer": "The capital of France is Paris."
@@ -230,7 +230,7 @@ The relationship models (IsCapitalOf, IsLocatedIn, IsCityOf, TookPlaceIn) define
230230
The `return_inputs=True` parameter in both retriever and generator components ensures that information flows through your pipeline without loss. This allows downstream components to access both the original query and any intermediate results, enabling more sophisticated processing strategies.
231231
The instruction set for the generator provides crucial guidance for response generation. The instruction to acknowledge when search results aren't relevant prevents hallucination and maintains system reliability. You can customize these instructions based on your specific use case requirements.
232232

233-
Don't forget that these instructions can be optimized to enhance the reasoning capabilities of your RAGs.
233+
Don't forget that these instructions can be optimized to enhance the reasoning capabilities of your KAGs.
234234

235235
## Key Takeaways
236236

docs/Code Examples/Advanced/Knowledge Extraction.md

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
11
# Knowledge Extraction
22

3-
Knowledge extraction from unstructured data is a cornerstone of neuro-symbolic AI applications, enabling systems to transform raw text into structured, logically queryable information. Synalinks provides a sophisticated framework that supports constrained property graph extraction and querying, offering unprecedented flexibility in how you architect your knowledge extraction pipelines.
3+
Knowledge extraction from unstructured data is a cornerstone of neuro-symbolic AI applications, enabling systems to transform raw text into structured, queryable information. Synalinks provides a sophisticated framework that supports constrained property graph extraction and querying, offering unprecedented flexibility in how you architect your knowledge extraction pipelines.
44

55
Synalinks leverages constrained property graphs as its foundation, where the schema is rigorously enforced through constrained JSON decoding. This approach ensures data integrity while maintaining the flexibility to store extracted knowledge in dedicated graph databases for efficient querying and retrieval.
66
The framework's modular design allows you to compose extraction pipelines from discrete, reusable components. Each component can be optimized independently, tested in isolation, and combined with others to create sophisticated data processing workflows.
77

8+
To illustrate our approach, we are going to use the same small language model with different architectures. So you can understand the pro and cons of each approach.
9+
810
```python
911
import synalinks
1012
import asyncio
@@ -77,6 +79,10 @@ async def one_stage_program(
7779

7880
The one-stage approach minimizes latency and reduces the complexity of pipeline orchestration. However, it demands models with substantial reasoning capabilities and may not be effective for scenarios involving smaller, specialized models.
7981

82+
#### Resulting Graph
83+
84+
![one_stage_graph](../../assets/one_stage_graph.png)
85+
8086
### Two-Stage Extraction
8187

8288
The two-stage approach represents a strategic decomposition of the extraction process, separating entity identification from relationship inference. This separation allows for specialized optimization at each stage and provides greater control.
@@ -101,7 +107,7 @@ async def two_stage_program(
101107
)(inputs)
102108

103109
# inputs_with_entities = inputs AND entities (See Control Flow tutorial)
104-
inputs_with_entities = inputs & entities
110+
inputs_with_entities = inputs & entities
105111
relations = await synalinks.Generator(
106112
data_model=MapRelations,
107113
language_model=language_model,
@@ -132,7 +138,6 @@ async def two_stage_program(
132138
to_folder="examples/knowledge_extraction",
133139
show_trainable=True,
134140
)
135-
136141
return program
137142

138143
```
@@ -141,6 +146,10 @@ async def two_stage_program(
141146

142147
This staged approach offers several advantages: entities can be extracted using lightweight models optimized for named entity recognition, while relationship inference can leverage more sophisticated reasoning models.
143148

149+
#### Resulting Graph
150+
151+
![two_stage_graph](../../assets/two_stage_graph.png)
152+
144153
### Multi-Stage Extraction
145154

146155
If you have heterogeneous data models, or if you are using small language models (SLMs), you might want to consider using a separate generator for each entity or relation to extract. This approach enhances the predictions of LMs by making one call per entity or relation type, thereby reducing the scope of the task for each call and enhancing accuracy. You can then combine the results of your extraction using logical operators (`And` or `Or`), depending on whether you want your aggregation to be robust to failures from the LMs.
@@ -309,6 +318,10 @@ if __name__ == "__main__":
309318

310319
![multi_stage_extraction](../../assets/multi_stage_extraction.png)
311320

321+
#### Resulting Graph
322+
323+
![multi_stage_graph](../../assets/multi_stage_graph.png)
324+
312325
### Dealing with Orphan Nodes
313326

314327
In some cases, specially if you want to use the `KnowledgeRetriever` you will have to extract nodes that are connected to each other. If intelligence is connecting the dot between your data, then orphan nodes are problematic.
@@ -439,6 +452,10 @@ if __name__ == "__main__":
439452

440453
![relations_only_multi_stage_extraction](../../assets/relations_only_multi_stage_extraction.png)
441454

455+
#### Resulting Graph
456+
457+
![relations_only_multi_stage_graph](../../assets/relations_only_multi_stage_graph.png)
458+
442459
## Conclusion
443460

444461
Synalinks represents a paradigm shift in knowledge extraction, moving beyond monolithic, inflexible approaches toward a modular, production-first framework that adapts to the complexities of real-world applications.

docs/assets/multi_stage_graph.png

69.8 KB
Loading

docs/assets/one_stage_graph.png

34.9 KB
Loading
124 KB
Loading
66.3 KB
Loading

docs/assets/two_stage_graph.png

42.7 KB
Loading

docs/index.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ async def main():
3838
)
3939

4040
language_model = synalinks.LanguageModel(
41-
model="ollama_chat/deepseek-r1",
41+
model="ollama/mistral",
4242
)
4343

4444
x0 = synalinks.Input(data_model=Query)
@@ -136,7 +136,7 @@ async def main():
136136
)
137137
return cls(language_model=language_model, **config)
138138

139-
language_model = synalinks.LanguageModel(model="ollama_chat/deepseek-r1")
139+
language_model = synalinks.LanguageModel(model="ollama/mistral")
140140

141141
program = ChainOfThought(language_model=language_model)
142142

@@ -203,7 +203,7 @@ async def main():
203203
)
204204

205205
language_model = synalinks.LanguageModel(
206-
model="ollama_chat/deepseek-r1",
206+
model="ollama/mistral",
207207
)
208208

209209
program = ChainOfThought(
@@ -241,7 +241,7 @@ async def main():
241241
)
242242

243243
language_model = synalinks.LanguageModel(
244-
model="ollama_chat/deepseek-r1",
244+
model="ollama/mistral",
245245
)
246246

247247
program = synalinks.Sequential(

synalinks/src/knowledge_bases/database_adapters/neo4j_adapter.py

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -263,7 +263,6 @@ async def triplet_search(
263263
triplet_search,
264264
k=10,
265265
threshold=0.7,
266-
combined_threshold=None,
267266
):
268267
if not is_triplet_search(triplet_search):
269268
raise ValueError(
@@ -287,13 +286,9 @@ async def triplet_search(
287286
)
288287
object_similarity_query = triplet_search.get("object_similarity_query")
289288

290-
if combined_threshold is None:
291-
combined_threshold = threshold
292-
293289
params = {
294290
"numberOfNearestNeighbours": k,
295291
"threshold": threshold,
296-
"combinedThreshold": combined_threshold,
297292
"k": k,
298293
}
299294

@@ -480,15 +475,15 @@ async def triplet_search(
480475
]
481476
)
482477

483-
# Add geometric mean calculation for triplet returns
478+
# Add geometric mean score calculation for triplets
484479
query_lines.append(
485480
(
486481
"WITH subj, subj_score, relation, obj, obj_score, "
487482
"sqrt(subj_score * obj_score) "
488-
"AS combined_score"
483+
"AS score"
489484
)
490485
)
491-
where_conditions.append("combined_score >= $combinedThreshold")
486+
where_conditions.append("score >= $threshold")
492487

493488
if where_conditions:
494489
query_lines.append(f"WHERE {' AND '.join(where_conditions)}")
@@ -499,8 +494,8 @@ async def triplet_search(
499494
"RETURN {name: subj.name, label: subj.label} AS subj,",
500495
" type(relation) AS relation,",
501496
" {name: obj.name, label: obj.label} AS obj,",
502-
" combined_score",
503-
"ORDER BY combined_score DESC",
497+
" score",
498+
"ORDER BY score DESC",
504499
]
505500
)
506501

0 commit comments

Comments
 (0)