Skip to content

Latest commit

 

History

History
140 lines (101 loc) · 3.3 KB

File metadata and controls

140 lines (101 loc) · 3.3 KB
IPCEI Next Generation Cloud Infrastructure and Services

The content of this repository comprises the work which is being developed in the scope of the RESCUE (RESilient Cloud for EUropE) project, a part of the IPCEI-CIS (IPCEI Next Generation Cloud Infrastructure and Services), the key digital policy project aimed at strengthening Europe's digital and technological sovereignty.

GASPAR

GenAI-powered System for Privacy incident Analysis and Recovery

GASPAR is a comprehensive system designed to:

  • Extract privacy-related fields from assessment documents
  • Model data distributions and sample input data based on extracted fields
  • Detect anomalous values in sampled data batches
  • Create code filters to exclude anomalous data
  • Safely deploy filters to quarantine problematic data

Features

  • Privacy rule extraction from IPA documents
  • Adaptive data sampling based on distribution modeling
  • Statistical and distribution-based anomaly detection
  • Automatic filter generation for anomalous data
  • Safe deployment with monitoring capabilities
  • Support for multiple LLM providers (OpenAI, Anthropic, Mistral)
  • Azure integration for cloud deployment

Installation

Local installation can be done using uv:

$ uv python install 3.10
$ uv python pin 3.10
$ uv venv -p python3.10
$ source .venv/bin/activate
$ uv pip install -e .

For development mode:

$ pip install -e ".[dev]"

For Databricks deployment

pip install -e ".[databricks]"

Usage

Basic Usage

from gaspar.config import load_config
from gaspar.pipeline.executor import PipelineExecutor
import asyncio

async def main():
    # Load configuration from file
    config = load_config("config.yaml")
    
    # Initialize pipeline
    executor = PipelineExecutor(config)
    
    # Process IPA document
    result = await executor.execute("documents/privacy_assessment.txt")
    
    if result.success:
        print("Privacy rules extracted successfully")
        print(f"Generated {len(result.outputs['filters'])} filters")

if __name__ == "__main__":
    asyncio.run(main())

Command Line Usage

After installation, the gaspar command-line tool is available:

$ gaspar document.txt
# Process IPA document and start monitoring

$ gaspar -c config.yaml document.txt
# Use custom configuration

$ gaspar -v document.txt
# Enable verbose logging

Configuration

GASPAR can be configured via YAML files. Example configuration:

# LLM Model Configuration
model:
  provider: "openai"
  model_name: "gpt-4"
  api_key: ${OPENAI_API_KEY}

# Storage Configuration
storage:
  type: "local"
  local_path: "./data"

# Pipeline Configuration
pipeline:
  batch_size: 100
  max_retries: 3
  temp_directory: "./temp"

Note

For the configuration to be used in the repo, it was adapted to use a locally deployed version of GPT-4 To use Openai API or any other LLM API provider, update openai_model.py or create a new one

Development

Testing

Running the tests can be done using tox:

$ tox -p

Building Packages

Building the packages:

$ tox -e packages
$ ls dist/

License

Apache-2.0