You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ContextGem is an LLM framework for easier, faster extraction of structured data and insights from documents through powerful abstractions.
19
+
ContextGem is a free, open-source LLM framework for easier, faster extraction of structured data and insights from documents through powerful abstractions.
18
20
19
21
20
22
## 💎 Why ContextGem?
@@ -26,6 +28,16 @@ ContextGem addresses this challenge by providing a flexible, intuitive framework
26
28
Read more on the project [motivation](https://contextgem.dev/motivation.html) in the documentation.
27
29
28
30
31
+
## 💡 What can you do with ContextGem?
32
+
33
+
With ContextGem, you can:
34
+
-**Extract structured data** from documents (text, images) with minimal code
35
+
-**Identify and analyze key aspects** (topics, themes, categories) within documents
36
+
-**Extract specific concepts** (entities, facts, conclusions, assessments) from documents
37
+
-**Build complex extraction workflows** through a simple, intuitive API
"The term of the agreement is 1 year from the Effective Date...\n"
178
190
"The Supplier shall provide consultancy services as described in Annex 2...\n"
179
191
"The Customer shall pay the Supplier within 30 calendar days of receiving an invoice...\n"
180
-
"The purple elephant danced gracefully on the moon while eating ice cream.\n"#out-of-context / anomaly
192
+
"The purple elephant danced gracefully on the moon while eating ice cream.\n"#💎 anomaly
181
193
"This agreement is governed by the laws of Norway...\n"
182
194
),
183
195
)
@@ -191,8 +203,9 @@ doc.concepts = [
191
203
reference_depth="sentences",
192
204
add_justifications=True,
193
205
justification_depth="brief",
194
-
# add more concepts to the document, if needed
195
206
)
207
+
# add more concepts to the document, if needed
208
+
# see the docs for available concepts: StringConcept, JsonObjectConcept, etc.
196
209
]
197
210
# Or use doc.add_concepts([...])
198
211
@@ -201,15 +214,17 @@ llm = DocumentLLM(
201
214
model="openai/gpt-4o-mini", # or any other LLM from e.g. Anthropic, etc.
202
215
api_key=os.environ.get(
203
216
"CONTEXTGEM_OPENAI_API_KEY"
204
-
), # your API key for the LLM provider
217
+
), # your API key for the LLM provider, e.g. OpenAI, Anthropic, etc.
205
218
# see the docs for more configuration options
206
219
)
207
220
208
221
# Extract information from the document
209
222
doc = llm.extract_all(doc) # or use async version llm.extract_all_async(doc)
210
223
211
224
# Access extracted information in the document object
212
-
print(doc.concepts[0].extracted_items) # extracted items with references justifications
225
+
print(
226
+
doc.concepts[0].extracted_items
227
+
) # extracted items with references & justifications
213
228
# or doc.get_concept_by_name("Anomalies").extracted_items
214
229
215
230
```
@@ -236,6 +251,14 @@ ContextGem leverages LLMs' long context windows to deliver superior extraction a
236
251
Read more on [how it works](https://contextgem.dev/how_it_works.html) in the documentation.
237
252
238
253
254
+
## 🤖 Supported LLMs
255
+
256
+
ContextGem supports both cloud-based and local LLMs through [LiteLLM](https://github.com/BerriAI/litellm) integration:
257
+
-**Cloud LLMs**: OpenAI, Anthropic, Google, Azure OpenAI, and more
258
+
-**Local LLMs**: Run models locally using providers like Ollama, LM Studio, etc.
259
+
-**Simple API**: Unified interface for all LLMs with easy provider switching
260
+
261
+
239
262
## ⚡ Optimizations
240
263
241
264
ContextGem documentation offers guidance on optimization strategies to maximize performance, minimize costs, and enhance extraction accuracy:
@@ -275,11 +298,20 @@ ContextGem is at an early stage. Our development roadmap includes:
275
298
We are committed to making ContextGem the most effective tool for extracting structured information from documents.
276
299
277
300
301
+
## 🔐 Security
302
+
303
+
This project is automatically scanned for security vulnerabilities using [CodeQL](https://codeql.github.com/). We also use [Snyk](https://snyk.io) as needed for supplementary dependency checks.
304
+
305
+
See [SECURITY](https://github.com/shcherbak-ai/contextgem/blob/main/SECURITY.md) file for details.
306
+
307
+
278
308
## 📄 License & Contact
279
309
280
310
This project is licensed under the Apache 2.0 License - see the [LICENSE](https://github.com/shcherbak-ai/contextgem/blob/main/LICENSE) and [NOTICE](https://github.com/shcherbak-ai/contextgem/blob/main/NOTICE) files for details.
We maintain security practices for the latest release of this library. Older versions may not receive security updates.
7
+
8
+
9
+
## Security Testing
10
+
11
+
This project is automatically tested for security issues using [CodeQL](https://codeql.github.com/) static analysis (run via GitHub Actions).
12
+
13
+
We also use [Snyk](https://snyk.io) as needed for supplementary dependency vulnerability monitoring.
14
+
15
+
16
+
## Data Privacy
17
+
18
+
This library uses LiteLLM as a local Python package to communicate with LLM providers using unified interface. No data or telemetry is transmitted to LiteLLM servers, as the SDK is run entirely within the user's environment. According to LiteLLM's documentation, self-hosted or local SDK use involves no data storage and no telemetry. For details, see [LiteLLM's documentation](https://docs.litellm.ai/docs/data_security).
19
+
20
+
21
+
## Reporting a Vulnerability
22
+
23
+
We value the security community's role in protecting our users. If you discover a potential security issue in this project, please report it as follows:
24
+
25
+
📧 **Email**: `sergii@shcherbak.ai`
26
+
27
+
When reporting, please include:
28
+
- A detailed description of the issue
29
+
- Steps to reproduce the vulnerability
30
+
- Any relevant logs, context, or configurations
31
+
32
+
We aim to respond promptly to all valid reports. Please note that we do not currently offer a bug bounty program.
33
+
34
+
35
+
## Questions?
36
+
37
+
If you’re unsure whether something is a vulnerability or just a bug, feel free to reach out via the email above before submitting a full report.
0 commit comments