You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ContextGem is an LLM framework for easier, faster extraction of structured data and insights from documents through powerful abstractions.
18
+
ContextGem is a free, open-source LLM framework for easier, faster extraction of structured data and insights from documents through powerful abstractions.
19
19
20
20
21
21
## 💎 Why ContextGem?
@@ -27,6 +27,16 @@ ContextGem addresses this challenge by providing a flexible, intuitive framework
27
27
Read more on the project [motivation](https://contextgem.dev/motivation.html) in the documentation.
28
28
29
29
30
+
## 💡 What can you do with ContextGem?
31
+
32
+
With ContextGem, you can:
33
+
-**Extract structured data** from documents (text, images) with minimal code
34
+
-**Identify and analyze key aspects** (topics, themes, categories) within documents
35
+
-**Extract specific concepts** (entities, facts, conclusions, assessments) from documents
36
+
-**Build complex extraction workflows** through a simple, intuitive API
"The term of the agreement is 1 year from the Effective Date...\n"
179
189
"The Supplier shall provide consultancy services as described in Annex 2...\n"
180
190
"The Customer shall pay the Supplier within 30 calendar days of receiving an invoice...\n"
181
-
"The purple elephant danced gracefully on the moon while eating ice cream.\n"#out-of-context / anomaly
191
+
"The purple elephant danced gracefully on the moon while eating ice cream.\n"#💎 anomaly
182
192
"This agreement is governed by the laws of Norway...\n"
183
193
),
184
194
)
@@ -192,8 +202,9 @@ doc.concepts = [
192
202
reference_depth="sentences",
193
203
add_justifications=True,
194
204
justification_depth="brief",
195
-
# add more concepts to the document, if needed
196
205
)
206
+
# add more concepts to the document, if needed
207
+
# see the docs for available concepts: StringConcept, JsonObjectConcept, etc.
197
208
]
198
209
# Or use doc.add_concepts([...])
199
210
@@ -202,15 +213,17 @@ llm = DocumentLLM(
202
213
model="openai/gpt-4o-mini", # or any other LLM from e.g. Anthropic, etc.
203
214
api_key=os.environ.get(
204
215
"CONTEXTGEM_OPENAI_API_KEY"
205
-
), # your API key for the LLM provider
216
+
), # your API key for the LLM provider, e.g. OpenAI, Anthropic, etc.
206
217
# see the docs for more configuration options
207
218
)
208
219
209
220
# Extract information from the document
210
221
doc = llm.extract_all(doc) # or use async version llm.extract_all_async(doc)
211
222
212
223
# Access extracted information in the document object
213
-
print(doc.concepts[0].extracted_items) # extracted items with references justifications
224
+
print(
225
+
doc.concepts[0].extracted_items
226
+
) # extracted items with references & justifications
214
227
# or doc.get_concept_by_name("Anomalies").extracted_items
215
228
216
229
```
@@ -237,6 +250,14 @@ ContextGem leverages LLMs' long context windows to deliver superior extraction a
237
250
Read more on [how it works](https://contextgem.dev/how_it_works.html) in the documentation.
238
251
239
252
253
+
## 🤖 Supported LLMs
254
+
255
+
ContextGem supports both cloud-based and local LLMs through [LiteLLM](https://github.com/BerriAI/litellm) integration:
256
+
-**Cloud LLMs**: OpenAI, Anthropic, Google, Azure OpenAI, and more
257
+
-**Local LLMs**: Run models locally using providers like Ollama, LM Studio, etc.
258
+
-**Simple API**: Unified interface for all LLMs with easy provider switching
259
+
260
+
240
261
## ⚡ Optimizations
241
262
242
263
ContextGem documentation offers guidance on optimization strategies to maximize performance, minimize costs, and enhance extraction accuracy:
@@ -280,7 +301,9 @@ We are committed to making ContextGem the most effective tool for extracting str
280
301
281
302
This project is licensed under the Apache 2.0 License - see the [LICENSE](https://github.com/shcherbak-ai/contextgem/blob/main/LICENSE) and [NOTICE](https://github.com/shcherbak-ai/contextgem/blob/main/NOTICE) files for details.
We maintain security practices for the latest release of this library. Older versions may not receive security updates.
7
+
8
+
9
+
## Security Testing
10
+
11
+
This project is regularly tested for security issues using both:
12
+
13
+
-[CodeQL](https://codeql.github.com/) static analysis (run via GitHub Actions)
14
+
-[Snyk](https://snyk.io) for continuous dependency vulnerability monitoring
15
+
16
+
All known transitive vulnerabilities have been manually triaged and either resolved or confirmed to be non-applicable based on how the library is used. See the repository's issue tracker or changelog for relevant audit notes when applicable.
17
+
18
+
19
+
## Data Privacy
20
+
21
+
This library uses LiteLLM as a local Python package to communicate with LLM providers using unified interface. No data or telemetry is transmitted to LiteLLM servers, as the SDK is run entirely within the user's environment. According to LiteLLM's documentation, self-hosted or local SDK use involves no data storage and no telemetry. For details, see [LiteLLM's documentation](https://docs.litellm.ai/docs/data_security).
22
+
23
+
24
+
## Reporting a Vulnerability
25
+
26
+
We value the security community's role in protecting our users. If you discover a potential security issue in this project, please report it as follows:
27
+
28
+
📧 **Email**: `sergii@shcherbak.ai`
29
+
30
+
When reporting, please include:
31
+
- A detailed description of the issue
32
+
- Steps to reproduce the vulnerability
33
+
- Any relevant logs, context, or configurations
34
+
35
+
We aim to respond promptly to all valid reports. Please note that we do not currently offer a bug bounty program.
36
+
37
+
38
+
## Questions?
39
+
40
+
If you’re unsure whether something is a vulnerability or just a bug, feel free to reach out via the email above before submitting a full report.
ContextGem is an LLM framework for easier, faster extraction of structured data and insights from documents through powerful abstractions.
18
+
ContextGem is a free, open-source LLM framework for easier, faster extraction of structured data and insights from documents through powerful abstractions.
19
19
20
20
21
21
## 💎 Why ContextGem?
@@ -27,6 +27,16 @@ ContextGem addresses this challenge by providing a flexible, intuitive framework
27
27
Read more on the project [motivation](https://contextgem.dev/motivation.html) in the documentation.
28
28
29
29
30
+
## 💡 What can you do with ContextGem?
31
+
32
+
With ContextGem, you can:
33
+
-**Extract structured data** from documents (text, images) with minimal code
34
+
-**Identify and analyze key aspects** (topics, themes, categories) within documents
35
+
-**Extract specific concepts** (entities, facts, conclusions, assessments) from documents
36
+
-**Build complex extraction workflows** through a simple, intuitive API
@@ -69,6 +79,14 @@ ContextGem leverages LLMs' long context windows to deliver superior extraction a
69
79
Read more on [how it works](https://contextgem.dev/how_it_works.html) in the documentation.
70
80
71
81
82
+
## 🤖 Supported LLMs
83
+
84
+
ContextGem supports both cloud-based and local LLMs through [LiteLLM](https://github.com/BerriAI/litellm) integration:
85
+
-**Cloud LLMs**: OpenAI, Anthropic, Google, Azure OpenAI, and more
86
+
-**Local LLMs**: Run models locally using providers like Ollama, LM Studio, etc.
87
+
-**Simple API**: Unified interface for all LLMs with easy provider switching
88
+
89
+
72
90
## ⚡ Optimizations
73
91
74
92
ContextGem documentation offers guidance on optimization strategies to maximize performance, minimize costs, and enhance extraction accuracy:
@@ -112,7 +130,9 @@ We are committed to making ContextGem the most effective tool for extracting str
112
130
113
131
This project is licensed under the Apache 2.0 License - see the [LICENSE](https://github.com/shcherbak-ai/contextgem/blob/main/LICENSE) and [NOTICE](https://github.com/shcherbak-ai/contextgem/blob/main/NOTICE) files for details.
0 commit comments