Security & docs

SergiiShcherbak · SergiiShcherbak · commit 0ddb70a2251c · 2025-04-07T02:17:46.000+02:00
diff --git a/.github/workflows/codeql.yml b/.github/workflows/codeql.yml
@@ -26,7 +26,7 @@ jobs:
       uses: actions/checkout@v4
 
     - name: Initialize CodeQL
-      uses: github/codeql-action/init@v2
+      uses: github/codeql-action/init@v3
       with:
         languages: ${{ matrix.language }}
 
@@ -56,6 +56,6 @@ jobs:
       run: poetry install --no-interaction --with dev --no-root
 
     - name: Perform CodeQL Analysis
-      uses: github/codeql-action/analyze@v2
+      uses: github/codeql-action/analyze@v3
       with:
         category: "/language:${{matrix.language}}" 
diff --git a/README.md b/README.md
@@ -15,7 +15,7 @@
 [![pre-commit](https://img.shields.io/badge/pre--commit-enabled-blue?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
 [![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg)](CODE_OF_CONDUCT.md)
 
-ContextGem is an LLM framework for easier, faster extraction of structured data and insights from documents through powerful abstractions.
+ContextGem is a free, open-source LLM framework for easier, faster extraction of structured data and insights from documents through powerful abstractions.
 
 
 ## 💎 Why ContextGem?
@@ -27,6 +27,16 @@ ContextGem addresses this challenge by providing a flexible, intuitive framework
 Read more on the project [motivation](https://contextgem.dev/motivation.html) in the documentation.
 
 
+## 💡 What can you do with ContextGem?
+
+With ContextGem, you can:
+- **Extract structured data** from documents (text, images) with minimal code
+- **Identify and analyze key aspects** (topics, themes, categories) within documents
+- **Extract specific concepts** (entities, facts, conclusions, assessments) from documents
+- **Build complex extraction workflows** through a simple, intuitive API
+- **Create multi-level extraction pipelines** (aspects containing concepts, hierarchical aspects)
+
+
 ## ⭐ Key features
 
 <table>
@@ -178,7 +188,7 @@ doc = Document(
         "The term of the agreement is 1 year from the Effective Date...\n"
         "The Supplier shall provide consultancy services as described in Annex 2...\n"
         "The Customer shall pay the Supplier within 30 calendar days of receiving an invoice...\n"
-        "The purple elephant danced gracefully on the moon while eating ice cream.\n"  # out-of-context / anomaly
+        "The purple elephant danced gracefully on the moon while eating ice cream.\n"  # 💎 anomaly
         "This agreement is governed by the laws of Norway...\n"
     ),
 )
@@ -192,8 +202,9 @@ doc.concepts = [
         reference_depth="sentences",
         add_justifications=True,
         justification_depth="brief",
-        # add more concepts to the document, if needed
     )
+    # add more concepts to the document, if needed
+    # see the docs for available concepts: StringConcept, JsonObjectConcept, etc.
 ]
 # Or use doc.add_concepts([...])
 
@@ -202,15 +213,17 @@ llm = DocumentLLM(
     model="openai/gpt-4o-mini",  # or any other LLM from e.g. Anthropic, etc.
     api_key=os.environ.get(
         "CONTEXTGEM_OPENAI_API_KEY"
-    ),  # your API key for the LLM provider
+    ),  # your API key for the LLM provider, e.g. OpenAI, Anthropic, etc.
     # see the docs for more configuration options
 )
 
 # Extract information from the document
 doc = llm.extract_all(doc)  # or use async version llm.extract_all_async(doc)
 
 # Access extracted information in the document object
-print(doc.concepts[0].extracted_items)  # extracted items with references justifications
+print(
+    doc.concepts[0].extracted_items
+)  # extracted items with references & justifications
 # or doc.get_concept_by_name("Anomalies").extracted_items
 
 ```
@@ -237,6 +250,14 @@ ContextGem leverages LLMs' long context windows to deliver superior extraction a
 Read more on [how it works](https://contextgem.dev/how_it_works.html) in the documentation.
 
 
+## 🤖 Supported LLMs
+
+ContextGem supports both cloud-based and local LLMs through [LiteLLM](https://github.com/BerriAI/litellm) integration:
+- **Cloud LLMs**: OpenAI, Anthropic, Google, Azure OpenAI, and more
+- **Local LLMs**: Run models locally using providers like Ollama, LM Studio, etc.
+- **Simple API**: Unified interface for all LLMs with easy provider switching
+
+
 ## ⚡ Optimizations
 
 ContextGem documentation offers guidance on optimization strategies to maximize performance, minimize costs, and enhance extraction accuracy:
@@ -280,7 +301,9 @@ We are committed to making ContextGem the most effective tool for extracting str
 
 This project is licensed under the Apache 2.0 License - see the [LICENSE](https://github.com/shcherbak-ai/contextgem/blob/main/LICENSE) and [NOTICE](https://github.com/shcherbak-ai/contextgem/blob/main/NOTICE) files for details.
 
-Copyright © 2025 [Shcherbak AI AS](https://shcherbak.ai) - AI engineering company developing tools for AI/ML/NLP developers.
+Copyright © 2025 [Shcherbak AI AS](https://shcherbak.ai), an AI engineering company building tools for AI/ML/NLP developers.
+
+Shcherbak AI is now part of Microsoft for Startups.
 
 [Connect with us on LinkedIn](https://www.linkedin.com/in/sergii-shcherbak-10068866/) for questions or collaboration ideas.
 
diff --git a/SECURITY.md b/SECURITY.md
@@ -0,0 +1,40 @@
+# Security Policy
+
+
+## Supported Versions
+
+We maintain security practices for the latest release of this library. Older versions may not receive security updates.
+
+
+## Security Testing
+
+This project is regularly tested for security issues using both:
+
+- [CodeQL](https://codeql.github.com/) static analysis (run via GitHub Actions)
+- [Snyk](https://snyk.io) for continuous dependency vulnerability monitoring
+
+All known transitive vulnerabilities have been manually triaged and either resolved or confirmed to be non-applicable based on how the library is used. See the repository's issue tracker or changelog for relevant audit notes when applicable.
+
+
+## Data Privacy
+
+This library uses LiteLLM as a local Python package to communicate with LLM providers using unified interface. No data or telemetry is transmitted to LiteLLM servers, as the SDK is run entirely within the user's environment. According to LiteLLM's documentation, self-hosted or local SDK use involves no data storage and no telemetry. For details, see [LiteLLM's documentation](https://docs.litellm.ai/docs/data_security).
+
+
+## Reporting a Vulnerability
+
+We value the security community's role in protecting our users. If you discover a potential security issue in this project, please report it as follows:
+
+📧 **Email**: `sergii@shcherbak.ai`
+
+When reporting, please include:
+- A detailed description of the issue
+- Steps to reproduce the vulnerability
+- Any relevant logs, context, or configurations
+
+We aim to respond promptly to all valid reports. Please note that we do not currently offer a bug bounty program.
+
+
+## Questions?
+
+If you’re unsure whether something is a vulnerability or just a bug, feel free to reach out via the email above before submitting a full report.
diff --git a/dev/readme.template.md b/dev/readme.template.md
@@ -15,7 +15,7 @@
 [![pre-commit](https://img.shields.io/badge/pre--commit-enabled-blue?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
 [![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg)](CODE_OF_CONDUCT.md)
 
-ContextGem is an LLM framework for easier, faster extraction of structured data and insights from documents through powerful abstractions.
+ContextGem is a free, open-source LLM framework for easier, faster extraction of structured data and insights from documents through powerful abstractions.
 
 
 ## 💎 Why ContextGem?
@@ -27,6 +27,16 @@ ContextGem addresses this challenge by providing a flexible, intuitive framework
 Read more on the project [motivation](https://contextgem.dev/motivation.html) in the documentation.
 
 
+## 💡 What can you do with ContextGem?
+
+With ContextGem, you can:
+- **Extract structured data** from documents (text, images) with minimal code
+- **Identify and analyze key aspects** (topics, themes, categories) within documents
+- **Extract specific concepts** (entities, facts, conclusions, assessments) from documents
+- **Build complex extraction workflows** through a simple, intuitive API
+- **Create multi-level extraction pipelines** (aspects containing concepts, hierarchical aspects)
+
+
 ## ⭐ Key features
 
 {{FEATURE_TABLE}}
@@ -69,6 +79,14 @@ ContextGem leverages LLMs' long context windows to deliver superior extraction a
 Read more on [how it works](https://contextgem.dev/how_it_works.html) in the documentation.
 
 
+## 🤖 Supported LLMs
+
+ContextGem supports both cloud-based and local LLMs through [LiteLLM](https://github.com/BerriAI/litellm) integration:
+- **Cloud LLMs**: OpenAI, Anthropic, Google, Azure OpenAI, and more
+- **Local LLMs**: Run models locally using providers like Ollama, LM Studio, etc.
+- **Simple API**: Unified interface for all LLMs with easy provider switching
+
+
 ## ⚡ Optimizations
 
 ContextGem documentation offers guidance on optimization strategies to maximize performance, minimize costs, and enhance extraction accuracy:
@@ -112,7 +130,9 @@ We are committed to making ContextGem the most effective tool for extracting str
 
 This project is licensed under the Apache 2.0 License - see the [LICENSE](https://github.com/shcherbak-ai/contextgem/blob/main/LICENSE) and [NOTICE](https://github.com/shcherbak-ai/contextgem/blob/main/NOTICE) files for details.
 
-Copyright © 2025 [Shcherbak AI AS](https://shcherbak.ai) - AI engineering company developing tools for AI/ML/NLP developers.
+Copyright © 2025 [Shcherbak AI AS](https://shcherbak.ai), an AI engineering company building tools for AI/ML/NLP developers.
+
+Shcherbak AI is now part of Microsoft for Startups.
 
 [Connect with us on LinkedIn](https://www.linkedin.com/in/sergii-shcherbak-10068866/) for questions or collaboration ideas.
 
diff --git a/dev/usage_examples/readme/quickstart.py b/dev/usage_examples/readme/quickstart.py
@@ -13,7 +13,7 @@
         "The term of the agreement is 1 year from the Effective Date...\n"
         "The Supplier shall provide consultancy services as described in Annex 2...\n"
         "The Customer shall pay the Supplier within 30 calendar days of receiving an invoice...\n"
-        "The purple elephant danced gracefully on the moon while eating ice cream.\n"  # out-of-context / anomaly
+        "The purple elephant danced gracefully on the moon while eating ice cream.\n"  # 💎 anomaly
         "This agreement is governed by the laws of Norway...\n"
     ),
 )
@@ -27,8 +27,9 @@
         reference_depth="sentences",
         add_justifications=True,
         justification_depth="brief",
-        # add more concepts to the document, if needed
     )
+    # add more concepts to the document, if needed
+    # see the docs for available concepts: StringConcept, JsonObjectConcept, etc.
 ]
 # Or use doc.add_concepts([...])
 
@@ -37,13 +38,15 @@
     model="openai/gpt-4o-mini",  # or any other LLM from e.g. Anthropic, etc.
     api_key=os.environ.get(
         "CONTEXTGEM_OPENAI_API_KEY"
-    ),  # your API key for the LLM provider
+    ),  # your API key for the LLM provider, e.g. OpenAI, Anthropic, etc.
     # see the docs for more configuration options
 )
 
 # Extract information from the document
 doc = llm.extract_all(doc)  # or use async version llm.extract_all_async(doc)
 
 # Access extracted information in the document object
-print(doc.concepts[0].extracted_items)  # extracted items with references justifications
+print(
+    doc.concepts[0].extracted_items
+)  # extracted items with references & justifications
 # or doc.get_concept_by_name("Anomalies").extracted_items
diff --git a/docs/docs-raw-for-llm.txt b/docs/docs-raw-for-llm.txt
@@ -270,7 +270,7 @@ Anomaly extraction example (ContextGem)
            "The term of the agreement is 1 year from the Effective Date...\n"
            "The Supplier shall provide consultancy services as described in Annex 2...\n"
            "The Customer shall pay the Supplier within 30 calendar days of receiving an invoice...\n"
-           "The purple elephant danced gracefully on the moon while eating ice cream.\n"  # out-of-context / anomaly
+           "The purple elephant danced gracefully on the moon while eating ice cream.\n"  # 💎 anomaly
            "This agreement is governed by the laws of Norway...\n"
        ),
    )
@@ -284,8 +284,9 @@ Anomaly extraction example (ContextGem)
            reference_depth="sentences",
            add_justifications=True,
            justification_depth="brief",
-           # add more concepts to the document, if needed
        )
+       # add more concepts to the document, if needed
+       # see the docs for available concepts: StringConcept, JsonObjectConcept, etc.
    ]
    # Or use doc.add_concepts([...])
 
@@ -294,15 +295,17 @@ Anomaly extraction example (ContextGem)
        model="openai/gpt-4o-mini",  # or any other LLM from e.g. Anthropic, etc.
        api_key=os.environ.get(
            "CONTEXTGEM_OPENAI_API_KEY"
-       ),  # your API key for the LLM provider
+       ),  # your API key for the LLM provider, e.g. OpenAI, Anthropic, etc.
        # see the docs for more configuration options
    )
 
    # Extract information from the document
    doc = llm.extract_all(doc)  # or use async version llm.extract_all_async(doc)
 
    # Access extracted information in the document object
-   print(doc.concepts[0].extracted_items)  # extracted items with references justifications
+   print(
+       doc.concepts[0].extracted_items
+   )  # extracted items with references & justifications
    # or doc.get_concept_by_name("Anomalies").extracted_items
 
 -[ LangChain ]-