IPCEI Next Generation Cloud Infrastructure and Services

The content of this repository comprises the work which is being developed in the scope of the RESCUE (RESilient Cloud for EUropE) project, a part of the IPCEI-CIS (IPCEI Next Generation Cloud Infrastructure and Services), the key digital policy project aimed at strengthening Europe's digital and technological sovereignty.

GASPAR

GenAI-powered System for Privacy incident Analysis and Recovery

GASPAR is a comprehensive system designed to:

Extract privacy-related fields from assessment documents
Model data distributions and sample input data based on extracted fields
Detect anomalous values in sampled data batches
Create code filters to exclude anomalous data
Safely deploy filters to quarantine problematic data

Features

Privacy rule extraction from IPA documents
Adaptive data sampling based on distribution modeling
Statistical and distribution-based anomaly detection
Automatic filter generation for anomalous data
Safe deployment with monitoring capabilities
Support for multiple LLM providers (OpenAI, Anthropic, Mistral)
Azure integration for cloud deployment

Installation

Local installation can be done using uv:

$ uv python install 3.10
$ uv python pin 3.10
$ uv venv -p python3.10
$ source .venv/bin/activate
$ uv pip install -e .

For development mode:

$ pip install -e ".[dev]"

For Databricks deployment

pip install -e ".[databricks]"

Usage

Basic Usage

from gaspar.config import load_config
from gaspar.pipeline.executor import PipelineExecutor
import asyncio

async def main():
    # Load configuration from file
    config = load_config("config.yaml")
    
    # Initialize pipeline
    executor = PipelineExecutor(config)
    
    # Process IPA document
    result = await executor.execute("documents/privacy_assessment.txt")
    
    if result.success:
        print("Privacy rules extracted successfully")
        print(f"Generated {len(result.outputs['filters'])} filters")

if __name__ == "__main__":
    asyncio.run(main())

Command Line Usage

After installation, the gaspar command-line tool is available:

$ gaspar document.txt
# Process IPA document and start monitoring

$ gaspar -c config.yaml document.txt
# Use custom configuration

$ gaspar -v document.txt
# Enable verbose logging

Configuration

GASPAR can be configured via YAML files. Example configuration:

# LLM Model Configuration
model:
  provider: "openai"
  model_name: "gpt-4"
  api_key: ${OPENAI_API_KEY}

# Storage Configuration
storage:
  type: "local"
  local_path: "./data"

# Pipeline Configuration
pipeline:
  batch_size: 100
  max_retries: 3
  temp_directory: "./temp"

Note

For the configuration to be used in the repo, it was adapted to use a locally deployed version of GPT-4 To use Openai API or any other LLM API provider, update openai_model.py or create a new one

Development

Testing

Running the tests can be done using tox:

$ tox -p

Building Packages

Building the packages:

$ tox -e packages
$ ls dist/

License

Apache-2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GASPAR

Features

Installation

Usage

Basic Usage

Command Line Usage

Configuration

Note

Development

Testing

Building Packages

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

GASPAR

Features

Installation

Usage

Basic Usage

Command Line Usage

Configuration

Note

Development

Testing

Building Packages

License