Skip to content

Commit 0ddb70a

Browse files
Security & docs
1 parent b7f3ebe commit 0ddb70a

6 files changed

Lines changed: 107 additions & 18 deletions

File tree

.github/workflows/codeql.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ jobs:
2626
uses: actions/checkout@v4
2727

2828
- name: Initialize CodeQL
29-
uses: github/codeql-action/init@v2
29+
uses: github/codeql-action/init@v3
3030
with:
3131
languages: ${{ matrix.language }}
3232

@@ -56,6 +56,6 @@ jobs:
5656
run: poetry install --no-interaction --with dev --no-root
5757

5858
- name: Perform CodeQL Analysis
59-
uses: github/codeql-action/analyze@v2
59+
uses: github/codeql-action/analyze@v3
6060
with:
6161
category: "/language:${{matrix.language}}"

README.md

Lines changed: 29 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-blue?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
1616
[![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg)](CODE_OF_CONDUCT.md)
1717

18-
ContextGem is an LLM framework for easier, faster extraction of structured data and insights from documents through powerful abstractions.
18+
ContextGem is a free, open-source LLM framework for easier, faster extraction of structured data and insights from documents through powerful abstractions.
1919

2020

2121
## 💎 Why ContextGem?
@@ -27,6 +27,16 @@ ContextGem addresses this challenge by providing a flexible, intuitive framework
2727
Read more on the project [motivation](https://contextgem.dev/motivation.html) in the documentation.
2828

2929

30+
## 💡 What can you do with ContextGem?
31+
32+
With ContextGem, you can:
33+
- **Extract structured data** from documents (text, images) with minimal code
34+
- **Identify and analyze key aspects** (topics, themes, categories) within documents
35+
- **Extract specific concepts** (entities, facts, conclusions, assessments) from documents
36+
- **Build complex extraction workflows** through a simple, intuitive API
37+
- **Create multi-level extraction pipelines** (aspects containing concepts, hierarchical aspects)
38+
39+
3040
## ⭐ Key features
3141

3242
<table>
@@ -178,7 +188,7 @@ doc = Document(
178188
"The term of the agreement is 1 year from the Effective Date...\n"
179189
"The Supplier shall provide consultancy services as described in Annex 2...\n"
180190
"The Customer shall pay the Supplier within 30 calendar days of receiving an invoice...\n"
181-
"The purple elephant danced gracefully on the moon while eating ice cream.\n" # out-of-context / anomaly
191+
"The purple elephant danced gracefully on the moon while eating ice cream.\n" # 💎 anomaly
182192
"This agreement is governed by the laws of Norway...\n"
183193
),
184194
)
@@ -192,8 +202,9 @@ doc.concepts = [
192202
reference_depth="sentences",
193203
add_justifications=True,
194204
justification_depth="brief",
195-
# add more concepts to the document, if needed
196205
)
206+
# add more concepts to the document, if needed
207+
# see the docs for available concepts: StringConcept, JsonObjectConcept, etc.
197208
]
198209
# Or use doc.add_concepts([...])
199210

@@ -202,15 +213,17 @@ llm = DocumentLLM(
202213
model="openai/gpt-4o-mini", # or any other LLM from e.g. Anthropic, etc.
203214
api_key=os.environ.get(
204215
"CONTEXTGEM_OPENAI_API_KEY"
205-
), # your API key for the LLM provider
216+
), # your API key for the LLM provider, e.g. OpenAI, Anthropic, etc.
206217
# see the docs for more configuration options
207218
)
208219

209220
# Extract information from the document
210221
doc = llm.extract_all(doc) # or use async version llm.extract_all_async(doc)
211222

212223
# Access extracted information in the document object
213-
print(doc.concepts[0].extracted_items) # extracted items with references justifications
224+
print(
225+
doc.concepts[0].extracted_items
226+
) # extracted items with references & justifications
214227
# or doc.get_concept_by_name("Anomalies").extracted_items
215228

216229
```
@@ -237,6 +250,14 @@ ContextGem leverages LLMs' long context windows to deliver superior extraction a
237250
Read more on [how it works](https://contextgem.dev/how_it_works.html) in the documentation.
238251

239252

253+
## 🤖 Supported LLMs
254+
255+
ContextGem supports both cloud-based and local LLMs through [LiteLLM](https://github.com/BerriAI/litellm) integration:
256+
- **Cloud LLMs**: OpenAI, Anthropic, Google, Azure OpenAI, and more
257+
- **Local LLMs**: Run models locally using providers like Ollama, LM Studio, etc.
258+
- **Simple API**: Unified interface for all LLMs with easy provider switching
259+
260+
240261
## ⚡ Optimizations
241262

242263
ContextGem documentation offers guidance on optimization strategies to maximize performance, minimize costs, and enhance extraction accuracy:
@@ -280,7 +301,9 @@ We are committed to making ContextGem the most effective tool for extracting str
280301

281302
This project is licensed under the Apache 2.0 License - see the [LICENSE](https://github.com/shcherbak-ai/contextgem/blob/main/LICENSE) and [NOTICE](https://github.com/shcherbak-ai/contextgem/blob/main/NOTICE) files for details.
282303

283-
Copyright © 2025 [Shcherbak AI AS](https://shcherbak.ai) - AI engineering company developing tools for AI/ML/NLP developers.
304+
Copyright © 2025 [Shcherbak AI AS](https://shcherbak.ai), an AI engineering company building tools for AI/ML/NLP developers.
305+
306+
Shcherbak AI is now part of Microsoft for Startups.
284307

285308
[Connect with us on LinkedIn](https://www.linkedin.com/in/sergii-shcherbak-10068866/) for questions or collaboration ideas.
286309

SECURITY.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Security Policy
2+
3+
4+
## Supported Versions
5+
6+
We maintain security practices for the latest release of this library. Older versions may not receive security updates.
7+
8+
9+
## Security Testing
10+
11+
This project is regularly tested for security issues using both:
12+
13+
- [CodeQL](https://codeql.github.com/) static analysis (run via GitHub Actions)
14+
- [Snyk](https://snyk.io) for continuous dependency vulnerability monitoring
15+
16+
All known transitive vulnerabilities have been manually triaged and either resolved or confirmed to be non-applicable based on how the library is used. See the repository's issue tracker or changelog for relevant audit notes when applicable.
17+
18+
19+
## Data Privacy
20+
21+
This library uses LiteLLM as a local Python package to communicate with LLM providers using unified interface. No data or telemetry is transmitted to LiteLLM servers, as the SDK is run entirely within the user's environment. According to LiteLLM's documentation, self-hosted or local SDK use involves no data storage and no telemetry. For details, see [LiteLLM's documentation](https://docs.litellm.ai/docs/data_security).
22+
23+
24+
## Reporting a Vulnerability
25+
26+
We value the security community's role in protecting our users. If you discover a potential security issue in this project, please report it as follows:
27+
28+
📧 **Email**: `sergii@shcherbak.ai`
29+
30+
When reporting, please include:
31+
- A detailed description of the issue
32+
- Steps to reproduce the vulnerability
33+
- Any relevant logs, context, or configurations
34+
35+
We aim to respond promptly to all valid reports. Please note that we do not currently offer a bug bounty program.
36+
37+
38+
## Questions?
39+
40+
If you’re unsure whether something is a vulnerability or just a bug, feel free to reach out via the email above before submitting a full report.

dev/readme.template.md

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-blue?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
1616
[![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg)](CODE_OF_CONDUCT.md)
1717

18-
ContextGem is an LLM framework for easier, faster extraction of structured data and insights from documents through powerful abstractions.
18+
ContextGem is a free, open-source LLM framework for easier, faster extraction of structured data and insights from documents through powerful abstractions.
1919

2020

2121
## 💎 Why ContextGem?
@@ -27,6 +27,16 @@ ContextGem addresses this challenge by providing a flexible, intuitive framework
2727
Read more on the project [motivation](https://contextgem.dev/motivation.html) in the documentation.
2828

2929

30+
## 💡 What can you do with ContextGem?
31+
32+
With ContextGem, you can:
33+
- **Extract structured data** from documents (text, images) with minimal code
34+
- **Identify and analyze key aspects** (topics, themes, categories) within documents
35+
- **Extract specific concepts** (entities, facts, conclusions, assessments) from documents
36+
- **Build complex extraction workflows** through a simple, intuitive API
37+
- **Create multi-level extraction pipelines** (aspects containing concepts, hierarchical aspects)
38+
39+
3040
## ⭐ Key features
3141

3242
{{FEATURE_TABLE}}
@@ -69,6 +79,14 @@ ContextGem leverages LLMs' long context windows to deliver superior extraction a
6979
Read more on [how it works](https://contextgem.dev/how_it_works.html) in the documentation.
7080

7181

82+
## 🤖 Supported LLMs
83+
84+
ContextGem supports both cloud-based and local LLMs through [LiteLLM](https://github.com/BerriAI/litellm) integration:
85+
- **Cloud LLMs**: OpenAI, Anthropic, Google, Azure OpenAI, and more
86+
- **Local LLMs**: Run models locally using providers like Ollama, LM Studio, etc.
87+
- **Simple API**: Unified interface for all LLMs with easy provider switching
88+
89+
7290
## ⚡ Optimizations
7391

7492
ContextGem documentation offers guidance on optimization strategies to maximize performance, minimize costs, and enhance extraction accuracy:
@@ -112,7 +130,9 @@ We are committed to making ContextGem the most effective tool for extracting str
112130

113131
This project is licensed under the Apache 2.0 License - see the [LICENSE](https://github.com/shcherbak-ai/contextgem/blob/main/LICENSE) and [NOTICE](https://github.com/shcherbak-ai/contextgem/blob/main/NOTICE) files for details.
114132

115-
Copyright © 2025 [Shcherbak AI AS](https://shcherbak.ai) - AI engineering company developing tools for AI/ML/NLP developers.
133+
Copyright © 2025 [Shcherbak AI AS](https://shcherbak.ai), an AI engineering company building tools for AI/ML/NLP developers.
134+
135+
Shcherbak AI is now part of Microsoft for Startups.
116136

117137
[Connect with us on LinkedIn](https://www.linkedin.com/in/sergii-shcherbak-10068866/) for questions or collaboration ideas.
118138

dev/usage_examples/readme/quickstart.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
"The term of the agreement is 1 year from the Effective Date...\n"
1414
"The Supplier shall provide consultancy services as described in Annex 2...\n"
1515
"The Customer shall pay the Supplier within 30 calendar days of receiving an invoice...\n"
16-
"The purple elephant danced gracefully on the moon while eating ice cream.\n" # out-of-context / anomaly
16+
"The purple elephant danced gracefully on the moon while eating ice cream.\n" # 💎 anomaly
1717
"This agreement is governed by the laws of Norway...\n"
1818
),
1919
)
@@ -27,8 +27,9 @@
2727
reference_depth="sentences",
2828
add_justifications=True,
2929
justification_depth="brief",
30-
# add more concepts to the document, if needed
3130
)
31+
# add more concepts to the document, if needed
32+
# see the docs for available concepts: StringConcept, JsonObjectConcept, etc.
3233
]
3334
# Or use doc.add_concepts([...])
3435

@@ -37,13 +38,15 @@
3738
model="openai/gpt-4o-mini", # or any other LLM from e.g. Anthropic, etc.
3839
api_key=os.environ.get(
3940
"CONTEXTGEM_OPENAI_API_KEY"
40-
), # your API key for the LLM provider
41+
), # your API key for the LLM provider, e.g. OpenAI, Anthropic, etc.
4142
# see the docs for more configuration options
4243
)
4344

4445
# Extract information from the document
4546
doc = llm.extract_all(doc) # or use async version llm.extract_all_async(doc)
4647

4748
# Access extracted information in the document object
48-
print(doc.concepts[0].extracted_items) # extracted items with references justifications
49+
print(
50+
doc.concepts[0].extracted_items
51+
) # extracted items with references & justifications
4952
# or doc.get_concept_by_name("Anomalies").extracted_items

docs/docs-raw-for-llm.txt

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -270,7 +270,7 @@ Anomaly extraction example (ContextGem)
270270
"The term of the agreement is 1 year from the Effective Date...\n"
271271
"The Supplier shall provide consultancy services as described in Annex 2...\n"
272272
"The Customer shall pay the Supplier within 30 calendar days of receiving an invoice...\n"
273-
"The purple elephant danced gracefully on the moon while eating ice cream.\n" # out-of-context / anomaly
273+
"The purple elephant danced gracefully on the moon while eating ice cream.\n" # 💎 anomaly
274274
"This agreement is governed by the laws of Norway...\n"
275275
),
276276
)
@@ -284,8 +284,9 @@ Anomaly extraction example (ContextGem)
284284
reference_depth="sentences",
285285
add_justifications=True,
286286
justification_depth="brief",
287-
# add more concepts to the document, if needed
288287
)
288+
# add more concepts to the document, if needed
289+
# see the docs for available concepts: StringConcept, JsonObjectConcept, etc.
289290
]
290291
# Or use doc.add_concepts([...])
291292

@@ -294,15 +295,17 @@ Anomaly extraction example (ContextGem)
294295
model="openai/gpt-4o-mini", # or any other LLM from e.g. Anthropic, etc.
295296
api_key=os.environ.get(
296297
"CONTEXTGEM_OPENAI_API_KEY"
297-
), # your API key for the LLM provider
298+
), # your API key for the LLM provider, e.g. OpenAI, Anthropic, etc.
298299
# see the docs for more configuration options
299300
)
300301

301302
# Extract information from the document
302303
doc = llm.extract_all(doc) # or use async version llm.extract_all_async(doc)
303304

304305
# Access extracted information in the document object
305-
print(doc.concepts[0].extracted_items) # extracted items with references justifications
306+
print(
307+
doc.concepts[0].extracted_items
308+
) # extracted items with references & justifications
306309
# or doc.get_concept_by_name("Anomalies").extracted_items
307310

308311
-[ LangChain ]-

0 commit comments

Comments
 (0)