You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The `return_inputs=True` parameter in both retriever and generator components ensures that information flows through your pipeline without loss. This allows downstream components to access both the original query and any intermediate results, enabling more sophisticated processing strategies.
231
231
The instruction set for the generator provides crucial guidance for response generation. The instruction to acknowledge when search results aren't relevant prevents hallucination and maintains system reliability. You can customize these instructions based on your specific use case requirements.
232
232
233
-
Don't forget that these instructions can be optimized to enhance the reasoning capabilities of your RAGs.
233
+
Don't forget that these instructions can be optimized to enhance the reasoning capabilities of your KAGs.
Copy file name to clipboardExpand all lines: docs/Code Examples/Advanced/Knowledge Extraction.md
+20-3Lines changed: 20 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,12 @@
1
1
# Knowledge Extraction
2
2
3
-
Knowledge extraction from unstructured data is a cornerstone of neuro-symbolic AI applications, enabling systems to transform raw text into structured, logically queryable information. Synalinks provides a sophisticated framework that supports constrained property graph extraction and querying, offering unprecedented flexibility in how you architect your knowledge extraction pipelines.
3
+
Knowledge extraction from unstructured data is a cornerstone of neuro-symbolic AI applications, enabling systems to transform raw text into structured, queryable information. Synalinks provides a sophisticated framework that supports constrained property graph extraction and querying, offering unprecedented flexibility in how you architect your knowledge extraction pipelines.
4
4
5
5
Synalinks leverages constrained property graphs as its foundation, where the schema is rigorously enforced through constrained JSON decoding. This approach ensures data integrity while maintaining the flexibility to store extracted knowledge in dedicated graph databases for efficient querying and retrieval.
6
6
The framework's modular design allows you to compose extraction pipelines from discrete, reusable components. Each component can be optimized independently, tested in isolation, and combined with others to create sophisticated data processing workflows.
7
7
8
+
To illustrate our approach, we are going to use the same small language model with different architectures. So you can understand the pro and cons of each approach.
9
+
8
10
```python
9
11
import synalinks
10
12
import asyncio
@@ -77,6 +79,10 @@ async def one_stage_program(
77
79
78
80
The one-stage approach minimizes latency and reduces the complexity of pipeline orchestration. However, it demands models with substantial reasoning capabilities and may not be effective for scenarios involving smaller, specialized models.
The two-stage approach represents a strategic decomposition of the extraction process, separating entity identification from relationship inference. This separation allows for specialized optimization at each stage and provides greater control.
@@ -101,7 +107,7 @@ async def two_stage_program(
101
107
)(inputs)
102
108
103
109
# inputs_with_entities = inputs AND entities (See Control Flow tutorial)
104
-
inputs_with_entities = inputs & entities
110
+
inputs_with_entities = inputs & entities
105
111
relations =await synalinks.Generator(
106
112
data_model=MapRelations,
107
113
language_model=language_model,
@@ -132,7 +138,6 @@ async def two_stage_program(
132
138
to_folder="examples/knowledge_extraction",
133
139
show_trainable=True,
134
140
)
135
-
136
141
return program
137
142
138
143
```
@@ -141,6 +146,10 @@ async def two_stage_program(
141
146
142
147
This staged approach offers several advantages: entities can be extracted using lightweight models optimized for named entity recognition, while relationship inference can leverage more sophisticated reasoning models.
If you have heterogeneous data models, or if you are using small language models (SLMs), you might want to consider using a separate generator for each entity or relation to extract. This approach enhances the predictions of LMs by making one call per entity or relation type, thereby reducing the scope of the task for each call and enhancing accuracy. You can then combine the results of your extraction using logical operators (`And` or `Or`), depending on whether you want your aggregation to be robust to failures from the LMs.
In some cases, specially if you want to use the `KnowledgeRetriever` you will have to extract nodes that are connected to each other. If intelligence is connecting the dot between your data, then orphan nodes are problematic.
Synalinks represents a paradigm shift in knowledge extraction, moving beyond monolithic, inflexible approaches toward a modular, production-first framework that adapts to the complexities of real-world applications.
0 commit comments