shcherbak-ai
diff --git a/‎.github/workflows/README.md‎
Lines changed: 31 additions & 0 deletions b/‎.github/workflows/README.md‎
Lines changed: 31 additions & 0 deletions
diff --git a/‎.github/workflows/ci-tests.yml‎
Lines changed: 3 additions & 2 deletions b/‎.github/workflows/ci-tests.yml‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎.github/workflows/codeql.yml‎
Lines changed: 63 additions & 0 deletions b/‎.github/workflows/codeql.yml‎
Lines changed: 63 additions & 0 deletions
diff --git a/‎.github/workflows/contributor-agreement-check.yml‎
Lines changed: 34 additions & 2 deletions b/‎.github/workflows/contributor-agreement-check.yml‎
Lines changed: 34 additions & 2 deletions
diff --git a/‎README.md‎
Lines changed: 38 additions & 6 deletions b/‎README.md‎
Lines changed: 38 additions & 6 deletions
diff --git a/‎SECURITY.md‎
Lines changed: 37 additions & 0 deletions b/‎SECURITY.md‎
Lines changed: 37 additions & 0 deletions
@@ -2,6 +2,7 @@
 
 This directory contains GitHub Actions workflow configurations for continuous integration (CI) of the ContextGem project.
 
+
 ## Available Workflows
 
 ### tests (`ci-tests.yml`)
@@ -22,6 +23,33 @@ This directory contains GitHub Actions workflow configurations for continuous in
     - `CONTEXTGEM_OPENAI_API_KEY`: Secret OpenAI API key
     - `GIST_SECRET`: Secret token to upload coverage results to a gist for badge generation
 
+### CodeQL Analysis (`codeql.yml`)
+
+This workflow performs code security scanning using GitHub's CodeQL analysis engine.
+
+**Features:**
+- Scans Python codebase for security vulnerabilities and coding errors
+- Analyzes code quality and identifies potential issues
+- Results are available in the Security tab of the repository
+
+**Trigger:**
+- Automatically runs on push and pull request events on the main and dev branches
+- Scheduled to run weekly
+- Can be triggered manually through the GitHub Actions UI
+
+### Documentation Build (`docs.yml`)
+
+This workflow builds and deploys the project documentation to GitHub Pages.
+
+**Features:**
+- Builds documentation using Sphinx
+- Deploys documentation to GitHub Pages when merged to main
+- Creates preview builds on pull requests
+
+**Trigger:**
+- Automatically runs on push and pull request events on the main branch
+- Can be triggered manually through the GitHub Actions UI
+
 ### Check Contributor Agreement (`contributor-agreement-check.yml`)
 
 This workflow ensures all contributors have signed the Contributor Agreement by checking for properly filled agreement files.
@@ -35,7 +63,10 @@ This workflow ensures all contributors have signed the Contributor Agreement by
 **Trigger:**
 - Automatically runs on all pull request events (opened, synchronized, reopened)
 
+
 ## Running Workflows
 
 - **tests:** These run automatically on push/PR to the main branch
+- **CodeQL Analysis:** Runs automatically on push/PR to main/dev, weekly, and manually
+- **Documentation Build:** Runs automatically on push/PR to main and manually
 - **Check Contributor Agreement:** Runs automatically on all PRs
@@ -2,9 +2,9 @@ name: tests
 
 on:
   push:
-    branches: [ main ]
+    branches: [ main, dev ]
   pull_request:
-    branches: [ main ]
+    branches: [ main, dev ]
   workflow_dispatch:
 
 jobs:
@@ -92,6 +92,7 @@ jobs:
   update-badge:
     needs: tests-with-vcr
     runs-on: ubuntu-latest
+    if: github.ref == 'refs/heads/main'
     steps:
       - name: Download coverage artifact
         uses: actions/download-artifact@v4
 
@@ -0,0 +1,63 @@
+name: "CodeQL"
+
+on:
+  push:
+    branches: [ main, dev ]
+  pull_request:
+    branches: [ main, dev ]
+  schedule:
+    - cron: '0 0 * * 0'  # Run once per week at midnight on Sunday
+  workflow_dispatch:
+
+jobs:
+  analyze:
+    name: Analyze
+    runs-on: ubuntu-latest
+    permissions:
+      actions: read
+      contents: read
+      security-events: write
+
+    strategy:
+      fail-fast: false
+      matrix:
+        language: [ 'python' ]
+
+    steps:
+    - name: Checkout repository
+      uses: actions/checkout@v4
+
+    - name: Initialize CodeQL
+      uses: github/codeql-action/init@v3
+      with:
+        languages: ${{ matrix.language }}
+
+    - name: Set up Python
+      uses: actions/setup-python@v5
+      with:
+        python-version: '3.13'
+
+    - name: Install Poetry
+      uses: snok/install-poetry@v1
+      with:
+        virtualenvs-create: true
+        virtualenvs-in-project: true
+        installer-parallel: true
+
+    - name: Load cached pip wheels
+      id: cached-pip-wheels
+      uses: actions/cache@v4
+      with:
+        path: |
+          ~/.cache/pip
+          ~/Library/Caches/pip
+          ~\AppData\Local\pip\Cache
+        key: pip-${{ runner.os }}-python-${{ hashFiles('**/poetry.lock') }}
+
+    - name: Install dependencies
+      run: poetry install --no-interaction --with dev --no-root
+
+    - name: Perform CodeQL Analysis
+      uses: github/codeql-action/analyze@v3
+      with:
+        category: "/language:${{matrix.language}}" 
@@ -12,13 +12,44 @@ jobs:
   check-contributor-agreement:
     runs-on: ubuntu-latest
     steps:
+      - name: Check if user is a maintainer
+        id: check-maintainer
+        uses: actions/github-script@v7
+        with:
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+          script: |
+            const { owner, repo } = context.repo;
+            const username = context.payload.pull_request.user.login;
+            
+            try {
+              const { data: permission } = await github.rest.repos.getCollaboratorPermissionLevel({
+                owner,
+                repo,
+                username,
+              });
+              
+              // Skip check for users with admin or write permissions
+              if (['admin', 'write'].includes(permission.permission)) {
+                console.log(`User ${username} is a maintainer with ${permission.permission} permissions. Skipping check.`);
+                return true;
+              }
+              
+              console.log(`User ${username} has ${permission.permission} permissions. Continuing with check.`);
+              return false;
+            } catch (error) {
+              console.log(`Error checking permissions: ${error}`);
+              return false;
+            }
+
       - name: Checkout code
+        if: steps.check-maintainer.outputs.result != 'true'
         uses: actions/checkout@v4
         with:
           ref: ${{ github.event.pull_request.head.sha }}
           fetch-depth: 0
 
       - name: Check for contributor agreement
+        if: steps.check-maintainer.outputs.result != 'true'
         id: check-agreement
         run: |
           # Get the PR author's username
@@ -50,6 +81,7 @@ jobs:
           fi
 
       - name: Check for deleted contributor agreements
+        if: steps.check-maintainer.outputs.result != 'true'
         id: check-deleted
         run: |
           # Set proper base ref
@@ -68,8 +100,8 @@ jobs:
           fi
 
       - name: Comment on PR if checks fail
-        if: ${{ failure() }}
-        uses: actions/github-script@v6
+        if: ${{ failure() && steps.check-maintainer.outputs.result != 'true' }}
+        uses: actions/github-script@v7
         with:
           github-token: ${{ secrets.GITHUB_TOKEN }}
           script: |
 
@@ -7,14 +7,16 @@
 [![docs](https://github.com/shcherbak-ai/contextgem/actions/workflows/docs.yml/badge.svg?branch=main)](https://github.com/shcherbak-ai/contextgem/actions/workflows/docs.yml)
 [![documentation](https://img.shields.io/badge/docs-latest-blue.svg)](https://shcherbak-ai.github.io/contextgem/)
 [![License](https://img.shields.io/badge/License-Apache_2.0-bright.svg)](https://opensource.org/licenses/Apache-2.0)
+![PyPI](https://img.shields.io/pypi/v/contextgem)
 [![Python Versions](https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12%20%7C%203.13-blue)](https://www.python.org/downloads/)
+[![Code Security](https://github.com/shcherbak-ai/contextgem/actions/workflows/codeql.yml/badge.svg?branch=main)](https://github.com/shcherbak-ai/contextgem/actions/workflows/codeql.yml)
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 [![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat)](https://pycqa.github.io/isort/)
 [![Pydantic v2](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/pydantic/pydantic/main/docs/badge/v2.json)](https://pydantic.dev)
 [![pre-commit](https://img.shields.io/badge/pre--commit-enabled-blue?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
 [![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg)](CODE_OF_CONDUCT.md)
 
-ContextGem is an LLM framework for easier, faster extraction of structured data and insights from documents through powerful abstractions.
+ContextGem is a free, open-source LLM framework for easier, faster extraction of structured data and insights from documents through powerful abstractions.
 
 
 ## 💎 Why ContextGem?
@@ -26,6 +28,16 @@ ContextGem addresses this challenge by providing a flexible, intuitive framework
 Read more on the project [motivation](https://contextgem.dev/motivation.html) in the documentation.
 
 
+## 💡 What can you do with ContextGem?
+
+With ContextGem, you can:
+- **Extract structured data** from documents (text, images) with minimal code
+- **Identify and analyze key aspects** (topics, themes, categories) within documents
+- **Extract specific concepts** (entities, facts, conclusions, assessments) from documents
+- **Build complex extraction workflows** through a simple, intuitive API
+- **Create multi-level extraction pipelines** (aspects containing concepts, hierarchical aspects)
+
+
 ## ⭐ Key features
 
 <table>
@@ -177,7 +189,7 @@ doc = Document(
         "The term of the agreement is 1 year from the Effective Date...\n"
         "The Supplier shall provide consultancy services as described in Annex 2...\n"
         "The Customer shall pay the Supplier within 30 calendar days of receiving an invoice...\n"
-        "The purple elephant danced gracefully on the moon while eating ice cream.\n"  # out-of-context / anomaly
+        "The purple elephant danced gracefully on the moon while eating ice cream.\n"  # 💎 anomaly
         "This agreement is governed by the laws of Norway...\n"
     ),
 )
@@ -191,8 +203,9 @@ doc.concepts = [
         reference_depth="sentences",
         add_justifications=True,
         justification_depth="brief",
-        # add more concepts to the document, if needed
     )
+    # add more concepts to the document, if needed
+    # see the docs for available concepts: StringConcept, JsonObjectConcept, etc.
 ]
 # Or use doc.add_concepts([...])
 
@@ -201,15 +214,17 @@ llm = DocumentLLM(
     model="openai/gpt-4o-mini",  # or any other LLM from e.g. Anthropic, etc.
     api_key=os.environ.get(
         "CONTEXTGEM_OPENAI_API_KEY"
-    ),  # your API key for the LLM provider
+    ),  # your API key for the LLM provider, e.g. OpenAI, Anthropic, etc.
     # see the docs for more configuration options
 )
 
 # Extract information from the document
 doc = llm.extract_all(doc)  # or use async version llm.extract_all_async(doc)
 
 # Access extracted information in the document object
-print(doc.concepts[0].extracted_items)  # extracted items with references justifications
+print(
+    doc.concepts[0].extracted_items
+)  # extracted items with references & justifications
 # or doc.get_concept_by_name("Anomalies").extracted_items
 
 ```
@@ -236,6 +251,14 @@ ContextGem leverages LLMs' long context windows to deliver superior extraction a
 Read more on [how it works](https://contextgem.dev/how_it_works.html) in the documentation.
 
 
+## 🤖 Supported LLMs
+
+ContextGem supports both cloud-based and local LLMs through [LiteLLM](https://github.com/BerriAI/litellm) integration:
+- **Cloud LLMs**: OpenAI, Anthropic, Google, Azure OpenAI, and more
+- **Local LLMs**: Run models locally using providers like Ollama, LM Studio, etc.
+- **Simple API**: Unified interface for all LLMs with easy provider switching
+
+
 ## ⚡ Optimizations
 
 ContextGem documentation offers guidance on optimization strategies to maximize performance, minimize costs, and enhance extraction accuracy:
@@ -275,11 +298,20 @@ ContextGem is at an early stage. Our development roadmap includes:
 We are committed to making ContextGem the most effective tool for extracting structured information from documents.
 
 
+## 🔐 Security
+
+This project is automatically scanned for security vulnerabilities using [CodeQL](https://codeql.github.com/). We also use [Snyk](https://snyk.io) as needed for supplementary dependency checks.
+
+See [SECURITY](https://github.com/shcherbak-ai/contextgem/blob/main/SECURITY.md) file for details.
+
+
 ## 📄 License & Contact
 
 This project is licensed under the Apache 2.0 License - see the [LICENSE](https://github.com/shcherbak-ai/contextgem/blob/main/LICENSE) and [NOTICE](https://github.com/shcherbak-ai/contextgem/blob/main/NOTICE) files for details.
 
-Copyright © 2025 [Shcherbak AI AS](https://shcherbak.ai) - AI engineering company developing tools for AI/ML/NLP developers.
+Copyright © 2025 [Shcherbak AI AS](https://shcherbak.ai), an AI engineering company building tools for AI/ML/NLP developers.
+
+Shcherbak AI is now part of Microsoft for Startups.
 
 [Connect with us on LinkedIn](https://www.linkedin.com/in/sergii-shcherbak-10068866/) for questions or collaboration ideas.
 
 
@@ -0,0 +1,37 @@
+# Security Policy
+
+
+## Supported Versions
+
+We maintain security practices for the latest release of this library. Older versions may not receive security updates.
+
+
+## Security Testing
+
+This project is automatically tested for security issues using [CodeQL](https://codeql.github.com/) static analysis (run via GitHub Actions).
+
+We also use [Snyk](https://snyk.io) as needed for supplementary dependency vulnerability monitoring.
+
+
+## Data Privacy
+
+This library uses LiteLLM as a local Python package to communicate with LLM providers using unified interface. No data or telemetry is transmitted to LiteLLM servers, as the SDK is run entirely within the user's environment. According to LiteLLM's documentation, self-hosted or local SDK use involves no data storage and no telemetry. For details, see [LiteLLM's documentation](https://docs.litellm.ai/docs/data_security).
+
+
+## Reporting a Vulnerability
+
+We value the security community's role in protecting our users. If you discover a potential security issue in this project, please report it as follows:
+
+📧 **Email**: `sergii@shcherbak.ai`
+
+When reporting, please include:
+- A detailed description of the issue
+- Steps to reproduce the vulnerability
+- Any relevant logs, context, or configurations
+
+We aim to respond promptly to all valid reports. Please note that we do not currently offer a bug bounty program.
+
+
+## Questions?
+
+If you’re unsure whether something is a vulnerability or just a bug, feel free to reach out via the email above before submitting a full report.