@@ -28,9 +28,8 @@ ContextGem addresses this challenge by providing a flexible, intuitive framework
2828Read more on the project [ motivation] ( https://contextgem.dev/motivation.html ) in the documentation.
2929
3030
31- ## 💡 What can you do with ContextGem?
31+ ## 💡 With ContextGem, you can:
3232
33- With ContextGem, you can:
3433- ** Extract structured data** from documents (text, images) with minimal code
3534- ** Identify and analyze key aspects** (topics, themes, categories) within documents
3635- ** Extract specific concepts** (entities, facts, conclusions, assessments) from documents
@@ -173,15 +172,72 @@ pip install -U contextgem
173172
174173## 🚀 Quick start
175174
175+ ### Aspect extraction
176+
177+ Aspect is a defined area or topic within a document (or another aspect). Each aspect reflects a specific subject or theme.
178+
179+ ``` python
180+ # Quick Start Example - Extracting payment terms from a document
181+
182+ import os
183+
184+ from contextgem import Aspect, Document, DocumentLLM
185+
186+ # Sample document text (shortened for brevity)
187+ doc = Document(
188+ raw_text = (
189+ " SERVICE AGREEMENT\n "
190+ " SERVICES. Provider agrees to provide the following services to Client: "
191+ " Cloud-based data analytics platform access and maintenance...\n "
192+ " PAYMENT. Client agrees to pay $5,000 per month for the services. "
193+ " Payment is due on the 1st of each month. Late payments will incur a 2% f ee per month...\n "
194+ " CONFIDENTIALITY. Both parties agree to keep all proprietary information confidential "
195+ " for a period of 5 years following termination of this Agreement..."
196+ ),
197+ )
198+
199+ # Define the aspects to extract
200+ doc.aspects = [
201+ Aspect(
202+ name = " Payment Terms" ,
203+ description = " Payment terms and conditions in the contract" ,
204+ # see the docs for more configuration options, e.g. sub-aspects, concepts, etc.
205+ ),
206+ # Add more aspects as needed
207+ ]
208+ # Or use `doc.add_aspects([...])`
209+
210+ # Define an LLM for extracting information from the document
211+ llm = DocumentLLM(
212+ model = " openai/gpt-4o-mini" , # or any other LLM from e.g. Anthropic, etc.
213+ api_key = os.environ.get(
214+ " CONTEXTGEM_OPENAI_API_KEY"
215+ ), # your API key for the LLM provider, e.g. OpenAI, Anthropic, etc.
216+ # see the docs for more configuration options
217+ )
218+
219+ # Extract information from the document
220+ doc = llm.extract_all(doc) # or use async version `await llm.extract_all_async(doc)`
221+
222+ # Access extracted information in the document object
223+ for item in doc.aspects[0 ].extracted_items:
224+ print (f " • { item.value} " )
225+ # or `doc.get_aspect_by_name("Payment Terms").extracted_items`
226+
227+ ```
228+
229+ ### Concept extraction
230+
231+ Concept is a unit of information or an entity, derived from an aspect or the broader document context.
232+
176233``` python
177234# Quick Start Example - Extracting anomalies from a document, with source references and justifications
178235
179236import os
180237
181238from contextgem import Document, DocumentLLM, StringConcept
182239
183- # Example document instance
184- # Document content is shortened for brevity
240+ # Sample document text (shortened for brevity)
185241doc = Document(
186242 raw_text = (
187243 " Consultancy Agreement\n "
@@ -203,13 +259,14 @@ doc.concepts = [
203259 reference_depth = " sentences" ,
204260 add_justifications = True ,
205261 justification_depth = " brief" ,
262+ # see the docs for more configuration options
206263 )
207264 # add more concepts to the document, if needed
208265 # see the docs for available concepts: StringConcept, JsonObjectConcept, etc.
209266]
210- # Or use doc.add_concepts([...])
267+ # Or use ` doc.add_concepts([...])`
211268
212- # Create an LLM for extracting data and insights from the document
269+ # Define an LLM for extracting information from the document
213270llm = DocumentLLM(
214271 model = " openai/gpt-4o-mini" , # or any other LLM from e.g. Anthropic, etc.
215272 api_key = os.environ.get(
@@ -219,16 +276,18 @@ llm = DocumentLLM(
219276)
220277
221278# Extract information from the document
222- doc = llm.extract_all(doc) # or use async version llm.extract_all_async(doc)
279+ doc = llm.extract_all(doc) # or use async version `await llm.extract_all_async(doc)`
223280
224281# Access extracted information in the document object
225282print (
226283 doc.concepts[0 ].extracted_items
227284) # extracted items with references & justifications
228- # or doc.get_concept_by_name("Anomalies").extracted_items
285+ # or ` doc.get_concept_by_name("Anomalies").extracted_items`
229286
230287```
231288
289+ ---
290+
232291See more examples in the documentation:
233292
234293### Basic usage examples
@@ -305,6 +364,20 @@ This project is automatically scanned for security vulnerabilities using [CodeQL
305364See [ SECURITY] ( https://github.com/shcherbak-ai/contextgem/blob/main/SECURITY.md ) file for details.
306365
307366
367+ ## 🙏 Acknowledgements
368+
369+ ContextGem relies on these excellent open-source packages:
370+
371+ - [ pydantic] ( https://github.com/pydantic/pydantic ) : The gold standard for data validation
372+ - [ Jinja2] ( https://github.com/pallets/jinja ) : Fast, expressive template engine that powers our dynamic prompt rendering
373+ - [ litellm] ( https://github.com/BerriAI/litellm ) : Unified interface to multiple LLM providers with seamless provider switching
374+ - [ wtpsplit] ( https://github.com/segment-any-text/wtpsplit ) : State-of-the-art text segmentation tool
375+ - [ loguru] ( https://github.com/Delgan/loguru ) : Simple yet powerful logging that enhances debugging and observability
376+ - [ python-ulid] ( https://github.com/mdomke/python-ulid ) : Efficient ULID generation
377+ - [ PyTorch] ( https://github.com/pytorch/pytorch ) : Industry-standard machine learning framework
378+ - [ aiolimiter] ( https://github.com/mjpieters/aiolimiter ) : Powerful rate limiting for async operations
379+
380+
308381## 📄 License & Contact
309382
310383This project is licensed under the Apache 2.0 License - see the [ LICENSE] ( https://github.com/shcherbak-ai/contextgem/blob/main/LICENSE ) and [ NOTICE] ( https://github.com/shcherbak-ai/contextgem/blob/main/NOTICE ) files for details.
0 commit comments