Skip to content

Commit d659e98

Browse files
Merge pull request #6 from shcherbak-ai/dev
docs: readme update
2 parents e867bb4 + 4939538 commit d659e98

7 files changed

Lines changed: 33 additions & 77 deletions

File tree

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ venv
88
.venv
99
.coverage
1010
.cz.msg
11+
~$*
12+
*.tmp
1113

1214
notebooks
1315
!dev/notebooks

README.md

Lines changed: 13 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@
1616
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-blue?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
1717
[![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg)](CODE_OF_CONDUCT.md)
1818

19+
<img src="https://contextgem.dev/_static/tab_solid.png" alt="ContextGem: 2nd Product of the week" width="250">
20+
<br/><br/>
21+
1922
ContextGem is a free, open-source LLM framework for easier, faster extraction of structured data and insights from documents through powerful abstractions.
2023

2124

@@ -28,17 +31,6 @@ ContextGem addresses this challenge by providing a flexible, intuitive framework
2831
Read more on the project [motivation](https://contextgem.dev/motivation.html) in the documentation.
2932

3033

31-
## 💡 With ContextGem, you can:
32-
33-
- **Extract structured data** from documents (text, images) with minimal code
34-
- **Identify and analyze key aspects** (topics, themes, categories) within documents
35-
- **Extract specific concepts** (entities, facts, conclusions, assessments) from documents
36-
- **Build complex extraction workflows** through a simple, intuitive API
37-
- **Create multi-level extraction pipelines** (aspects containing concepts, hierarchical aspects)
38-
39-
![ContextGem extraction example](https://contextgem.dev/_static/readme_code_snippet.png "ContextGem extraction example")
40-
41-
4234
## ⭐ Key features
4335

4436
<table>
@@ -165,34 +157,17 @@ Read more on the project [motivation](https://contextgem.dev/motivation.html) in
165157
\* See [descriptions](https://contextgem.dev/motivation.html#the-contextgem-solution) of ContextGem abstractions and [comparisons](https://contextgem.dev/vs_other_frameworks.html) of specific implementation examples using ContextGem and other popular open-source LLM frameworks.
166158

167159

168-
## 🧩 Core components
169-
170-
ContextGem's document-specific LLM extraction is built upon the following core components:
171-
172-
- 📄 **Document** model contains text and/or visual content representing a specific document. Examples:
160+
## 💡 With **minimal code**, you can:
173161

174-
- _legal documents_: contracts, policies, terms of service
175-
- _financial documents_: invoices, receipts, bank statements, financial reports
176-
- _business documents_: proposals, business plans, marketing materials, presentations
177-
178-
- 📚 **Aspect** model contains text representing a defined area or topic within a document. Examples:
179-
180-
- _contract aspects_: payment terms, termination clauses
181-
- _invoice aspects_: line-item breakdowns, tax details
182-
- _CV aspects_: work experience, education, skills
183-
184-
185-
- 🧠 **Concept** model contains a unit of information or an entity, derived from an aspect or the broader document context. Examples:
186-
187-
- _factual extractions_: a termination date in a contract, a total amount due in an invoice, or a certification in a CV
188-
- _analytical insights_: risk assessments, compliance evaluations
189-
- _reasoned conclusions_: determining whether a document meets specific criteria or answers particular questions
190-
191-
See other industry-specific examples in the table below:
162+
- **Extract structured data** from documents (text, images)
163+
- **Identify and analyze key aspects** (topics, themes, categories) within documents
164+
- **Extract specific concepts** (entities, facts, conclusions, assessments) from documents
165+
- **Build complex extraction workflows** through a simple, intuitive API
166+
- **Create multi-level extraction pipelines** (aspects containing concepts, hierarchical aspects)
192167

193-
![ContextGem component examples](https://contextgem.dev/_static/contextgem_component_examples.png "ContextGem component examples")
168+
<br/>
194169

195-
Read more on [how it works](https://contextgem.dev/how_it_works.html) in the documentation.
170+
![ContextGem extraction example](https://contextgem.dev/_static/readme_code_snippet.png "ContextGem extraction example")
196171

197172

198173
## 📦 Installation
@@ -342,6 +317,8 @@ See more examples in the documentation:
342317

343318
ContextGem leverages LLMs' long context windows to deliver superior extraction accuracy from individual documents. Unlike RAG approaches that often [struggle with complex concepts and nuanced insights](https://www.linkedin.com/pulse/raging-contracts-pitfalls-rag-contract-review-shcherbak-ai-ptg3f), ContextGem capitalizes on [continuously expanding context capacity](https://arxiv.org/abs/2502.12962), evolving LLM capabilities, and decreasing costs. This focused approach enables direct information extraction from complete documents, eliminating retrieval inconsistencies while optimizing for in-depth single-document analysis. While this delivers higher accuracy for individual documents, ContextGem does not currently support cross-document querying or corpus-wide retrieval - for these use cases, modern RAG systems (e.g., LlamaIndex, Haystack) remain more appropriate.
344319

320+
Read more on [how ContextGem works](https://contextgem.dev/how_it_works.html) in the documentation.
321+
345322

346323
## 🤖 Supported LLMs
347324

dev/readme.template.md

Lines changed: 13 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@
1616
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-blue?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
1717
[![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg)](CODE_OF_CONDUCT.md)
1818

19+
<img src="https://contextgem.dev/_static/tab_solid.png" alt="ContextGem: 2nd Product of the week" width="250">
20+
<br/><br/>
21+
1922
ContextGem is a free, open-source LLM framework for easier, faster extraction of structured data and insights from documents through powerful abstractions.
2023

2124

@@ -28,52 +31,24 @@ ContextGem addresses this challenge by providing a flexible, intuitive framework
2831
Read more on the project [motivation](https://contextgem.dev/motivation.html) in the documentation.
2932

3033

31-
## 💡 With ContextGem, you can:
32-
33-
- **Extract structured data** from documents (text, images) with minimal code
34-
- **Identify and analyze key aspects** (topics, themes, categories) within documents
35-
- **Extract specific concepts** (entities, facts, conclusions, assessments) from documents
36-
- **Build complex extraction workflows** through a simple, intuitive API
37-
- **Create multi-level extraction pipelines** (aspects containing concepts, hierarchical aspects)
38-
39-
![ContextGem extraction example](https://contextgem.dev/_static/readme_code_snippet.png "ContextGem extraction example")
40-
41-
4234
## ⭐ Key features
4335

4436
{{FEATURE_TABLE}}
4537

4638
\* See [descriptions](https://contextgem.dev/motivation.html#the-contextgem-solution) of ContextGem abstractions and [comparisons](https://contextgem.dev/vs_other_frameworks.html) of specific implementation examples using ContextGem and other popular open-source LLM frameworks.
4739

4840

49-
## 🧩 Core components
50-
51-
ContextGem's document-specific LLM extraction is built upon the following core components:
52-
53-
- 📄 **Document** model contains text and/or visual content representing a specific document. Examples:
41+
## 💡 With **minimal code**, you can:
5442

55-
- _legal documents_: contracts, policies, terms of service
56-
- _financial documents_: invoices, receipts, bank statements, financial reports
57-
- _business documents_: proposals, business plans, marketing materials, presentations
58-
59-
- 📚 **Aspect** model contains text representing a defined area or topic within a document. Examples:
60-
61-
- _contract aspects_: payment terms, termination clauses
62-
- _invoice aspects_: line-item breakdowns, tax details
63-
- _CV aspects_: work experience, education, skills
64-
65-
66-
- 🧠 **Concept** model contains a unit of information or an entity, derived from an aspect or the broader document context. Examples:
67-
68-
- _factual extractions_: a termination date in a contract, a total amount due in an invoice, or a certification in a CV
69-
- _analytical insights_: risk assessments, compliance evaluations
70-
- _reasoned conclusions_: determining whether a document meets specific criteria or answers particular questions
71-
72-
See other industry-specific examples in the table below:
43+
- **Extract structured data** from documents (text, images)
44+
- **Identify and analyze key aspects** (topics, themes, categories) within documents
45+
- **Extract specific concepts** (entities, facts, conclusions, assessments) from documents
46+
- **Build complex extraction workflows** through a simple, intuitive API
47+
- **Create multi-level extraction pipelines** (aspects containing concepts, hierarchical aspects)
7348

74-
![ContextGem component examples](https://contextgem.dev/_static/contextgem_component_examples.png "ContextGem component examples")
49+
<br/>
7550

76-
Read more on [how it works](https://contextgem.dev/how_it_works.html) in the documentation.
51+
![ContextGem extraction example](https://contextgem.dev/_static/readme_code_snippet.png "ContextGem extraction example")
7752

7853

7954
## 📦 Installation
@@ -122,6 +97,8 @@ See more examples in the documentation:
12297

12398
ContextGem leverages LLMs' long context windows to deliver superior extraction accuracy from individual documents. Unlike RAG approaches that often [struggle with complex concepts and nuanced insights](https://www.linkedin.com/pulse/raging-contracts-pitfalls-rag-contract-review-shcherbak-ai-ptg3f), ContextGem capitalizes on [continuously expanding context capacity](https://arxiv.org/abs/2502.12962), evolving LLM capabilities, and decreasing costs. This focused approach enables direct information extraction from complete documents, eliminating retrieval inconsistencies while optimizing for in-depth single-document analysis. While this delivers higher accuracy for individual documents, ContextGem does not currently support cross-document querying or corpus-wide retrieval - for these use cases, modern RAG systems (e.g., LlamaIndex, Haystack) remain more appropriate.
12499

100+
Read more on [how ContextGem works](https://contextgem.dev/how_it_works.html) in the documentation.
101+
125102

126103
## 🤖 Supported LLMs
127104

dev/requirements/requirements.dev.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ nvidia-cusparselt-cu12==0.6.2 ; python_version >= "3.10" and python_version < "3
7272
nvidia-nccl-cu12==2.21.5 ; python_version >= "3.10" and python_version < "3.14" and platform_system == "Linux" and platform_machine == "x86_64"
7373
nvidia-nvjitlink-cu12==12.4.127 ; python_version >= "3.10" and python_version < "3.14" and platform_system == "Linux" and platform_machine == "x86_64"
7474
nvidia-nvtx-cu12==12.4.127 ; python_version >= "3.10" and python_version < "3.14" and platform_system == "Linux" and platform_machine == "x86_64"
75-
openai==1.73.0 ; python_version >= "3.10" and python_version < "3.14"
75+
openai==1.74.0 ; python_version >= "3.10" and python_version < "3.14"
7676
openfile==0.0.7 ; python_version >= "3.10" and python_version < "3.14"
7777
packaging==24.2 ; python_version >= "3.10" and python_version < "3.14"
7878
pandas==2.2.3 ; python_version >= "3.10" and python_version < "3.14"

dev/requirements/requirements.main.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ nvidia-cusparselt-cu12==0.6.2 ; python_version >= "3.10" and python_version < "3
5050
nvidia-nccl-cu12==2.21.5 ; python_version >= "3.10" and python_version < "3.14" and platform_system == "Linux" and platform_machine == "x86_64"
5151
nvidia-nvjitlink-cu12==12.4.127 ; python_version >= "3.10" and python_version < "3.14" and platform_system == "Linux" and platform_machine == "x86_64"
5252
nvidia-nvtx-cu12==12.4.127 ; python_version >= "3.10" and python_version < "3.14" and platform_system == "Linux" and platform_machine == "x86_64"
53-
openai==1.73.0 ; python_version >= "3.10" and python_version < "3.14"
53+
openai==1.74.0 ; python_version >= "3.10" and python_version < "3.14"
5454
openfile==0.0.7 ; python_version >= "3.10" and python_version < "3.14"
5555
packaging==24.2 ; python_version >= "3.10" and python_version < "3.14"
5656
pandas==2.2.3 ; python_version >= "3.10" and python_version < "3.14"

docs/source/_static/tab_solid.png

155 KB
Loading

poetry.lock

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)