Skip to content

Commit 52b13fa

Browse files
Notebooks. README and docs update.
1 parent 827b104 commit 52b13fa

25 files changed

+2170
-144
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ venv
99
.coverage
1010

1111
notebooks
12+
!dev/notebooks
1213
docs/build
1314
dist
1415
.DS_Store

.pre-commit-config.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,3 +58,12 @@ repos:
5858
pass_filenames: false
5959
always_run: true
6060
stages: [pre-commit]
61+
62+
# Generate example notebooks
63+
- id: generate-notebooks
64+
name: Generate example notebooks
65+
entry: python dev/generate_notebooks.py
66+
language: system
67+
pass_filenames: false
68+
always_run: true
69+
stages: [pre-commit]

CITATION.cff

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,5 +5,5 @@ authors:
55
given-names: Sergii
66
email: sergii@shcherbak.ai
77
title: "ContextGem: Easier and faster way to build LLM extraction workflows through powerful abstractions"
8-
date-released: 2024-04-02
8+
date-released: 2025-04-02
99
url: "https://github.com/shcherbak-ai/contextgem"

NOTICE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ Development Dependencies:
3737
- black: Code formatting
3838
- coverage: Test coverage measurement
3939
- isort: Sorting imports
40+
- nbformat: Notebook format utilities
4041
- pip-tools: Dependency management
4142
- pre-commit: Pre-commit hooks
4243
- pytest: Testing framework

README.md

Lines changed: 81 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,8 @@ ContextGem addresses this challenge by providing a flexible, intuitive framework
2828
Read more on the project [motivation](https://contextgem.dev/motivation.html) in the documentation.
2929

3030

31-
## 💡 What can you do with ContextGem?
31+
## 💡 With ContextGem, you can:
3232

33-
With ContextGem, you can:
3433
- **Extract structured data** from documents (text, images) with minimal code
3534
- **Identify and analyze key aspects** (topics, themes, categories) within documents
3635
- **Extract specific concepts** (entities, facts, conclusions, assessments) from documents
@@ -173,15 +172,72 @@ pip install -U contextgem
173172

174173
## 🚀 Quick start
175174

175+
### Aspect extraction
176+
177+
Aspect is a defined area or topic within a document (or another aspect). Each aspect reflects a specific subject or theme.
178+
179+
```python
180+
# Quick Start Example - Extracting payment terms from a document
181+
182+
import os
183+
184+
from contextgem import Aspect, Document, DocumentLLM
185+
186+
# Sample document text (shortened for brevity)
187+
doc = Document(
188+
raw_text=(
189+
"SERVICE AGREEMENT\n"
190+
"SERVICES. Provider agrees to provide the following services to Client: "
191+
"Cloud-based data analytics platform access and maintenance...\n"
192+
"PAYMENT. Client agrees to pay $5,000 per month for the services. "
193+
"Payment is due on the 1st of each month. Late payments will incur a 2% fee per month...\n"
194+
"CONFIDENTIALITY. Both parties agree to keep all proprietary information confidential "
195+
"for a period of 5 years following termination of this Agreement..."
196+
),
197+
)
198+
199+
# Define the aspects to extract
200+
doc.aspects = [
201+
Aspect(
202+
name="Payment Terms",
203+
description="Payment terms and conditions in the contract",
204+
# see the docs for more configuration options, e.g. sub-aspects, concepts, etc.
205+
),
206+
# Add more aspects as needed
207+
]
208+
# Or use `doc.add_aspects([...])`
209+
210+
# Define an LLM for extracting information from the document
211+
llm = DocumentLLM(
212+
model="openai/gpt-4o-mini", # or any other LLM from e.g. Anthropic, etc.
213+
api_key=os.environ.get(
214+
"CONTEXTGEM_OPENAI_API_KEY"
215+
), # your API key for the LLM provider, e.g. OpenAI, Anthropic, etc.
216+
# see the docs for more configuration options
217+
)
218+
219+
# Extract information from the document
220+
doc = llm.extract_all(doc) # or use async version `await llm.extract_all_async(doc)`
221+
222+
# Access extracted information in the document object
223+
for item in doc.aspects[0].extracted_items:
224+
print(f"{item.value}")
225+
# or `doc.get_aspect_by_name("Payment Terms").extracted_items`
226+
227+
```
228+
229+
### Concept extraction
230+
231+
Concept is a unit of information or an entity, derived from an aspect or the broader document context.
232+
176233
```python
177234
# Quick Start Example - Extracting anomalies from a document, with source references and justifications
178235

179236
import os
180237

181238
from contextgem import Document, DocumentLLM, StringConcept
182239

183-
# Example document instance
184-
# Document content is shortened for brevity
240+
# Sample document text (shortened for brevity)
185241
doc = Document(
186242
raw_text=(
187243
"Consultancy Agreement\n"
@@ -203,13 +259,14 @@ doc.concepts = [
203259
reference_depth="sentences",
204260
add_justifications=True,
205261
justification_depth="brief",
262+
# see the docs for more configuration options
206263
)
207264
# add more concepts to the document, if needed
208265
# see the docs for available concepts: StringConcept, JsonObjectConcept, etc.
209266
]
210-
# Or use doc.add_concepts([...])
267+
# Or use `doc.add_concepts([...])`
211268

212-
# Create an LLM for extracting data and insights from the document
269+
# Define an LLM for extracting information from the document
213270
llm = DocumentLLM(
214271
model="openai/gpt-4o-mini", # or any other LLM from e.g. Anthropic, etc.
215272
api_key=os.environ.get(
@@ -219,16 +276,18 @@ llm = DocumentLLM(
219276
)
220277

221278
# Extract information from the document
222-
doc = llm.extract_all(doc) # or use async version llm.extract_all_async(doc)
279+
doc = llm.extract_all(doc) # or use async version `await llm.extract_all_async(doc)`
223280

224281
# Access extracted information in the document object
225282
print(
226283
doc.concepts[0].extracted_items
227284
) # extracted items with references & justifications
228-
# or doc.get_concept_by_name("Anomalies").extracted_items
285+
# or `doc.get_concept_by_name("Anomalies").extracted_items`
229286

230287
```
231288

289+
---
290+
232291
See more examples in the documentation:
233292

234293
### Basic usage examples
@@ -305,6 +364,20 @@ This project is automatically scanned for security vulnerabilities using [CodeQL
305364
See [SECURITY](https://github.com/shcherbak-ai/contextgem/blob/main/SECURITY.md) file for details.
306365

307366

367+
## 🙏 Acknowledgements
368+
369+
ContextGem relies on these excellent open-source packages:
370+
371+
- [pydantic](https://github.com/pydantic/pydantic): The gold standard for data validation
372+
- [Jinja2](https://github.com/pallets/jinja): Fast, expressive template engine that powers our dynamic prompt rendering
373+
- [litellm](https://github.com/BerriAI/litellm): Unified interface to multiple LLM providers with seamless provider switching
374+
- [wtpsplit](https://github.com/segment-any-text/wtpsplit): State-of-the-art text segmentation tool
375+
- [loguru](https://github.com/Delgan/loguru): Simple yet powerful logging that enhances debugging and observability
376+
- [python-ulid](https://github.com/mdomke/python-ulid): Efficient ULID generation
377+
- [PyTorch](https://github.com/pytorch/pytorch): Industry-standard machine learning framework
378+
- [aiolimiter](https://github.com/mjpieters/aiolimiter): Powerful rate limiting for async operations
379+
380+
308381
## 📄 License & Contact
309382

310383
This project is licensed under the Apache 2.0 License - see the [LICENSE](https://github.com/shcherbak-ai/contextgem/blob/main/LICENSE) and [NOTICE](https://github.com/shcherbak-ai/contextgem/blob/main/NOTICE) files for details.

0 commit comments

Comments
 (0)