The content of this repository comprises the work which is being developed in the scope of the RESCUE (RESilient Cloud for EUropE) project, a part of the IPCEI-CIS (IPCEI Next Generation Cloud Infrastructure and Services), the key digital policy project aimed at strengthening Europe's digital and technological sovereignty.
GenAI-powered System for Privacy incident Analysis and Recovery
GASPAR is a comprehensive system designed to:
- Extract privacy-related fields from assessment documents
- Model data distributions and sample input data based on extracted fields
- Detect anomalous values in sampled data batches
- Create code filters to exclude anomalous data
- Safely deploy filters to quarantine problematic data
- Privacy rule extraction from IPA documents
- Adaptive data sampling based on distribution modeling
- Statistical and distribution-based anomaly detection
- Automatic filter generation for anomalous data
- Safe deployment with monitoring capabilities
- Support for multiple LLM providers (OpenAI, Anthropic, Mistral)
- Azure integration for cloud deployment
Local installation can be done using uv:
$ uv python install 3.10
$ uv python pin 3.10
$ uv venv -p python3.10
$ source .venv/bin/activate
$ uv pip install -e .For development mode:
$ pip install -e ".[dev]"For Databricks deployment
pip install -e ".[databricks]"from gaspar.config import load_config
from gaspar.pipeline.executor import PipelineExecutor
import asyncio
async def main():
# Load configuration from file
config = load_config("config.yaml")
# Initialize pipeline
executor = PipelineExecutor(config)
# Process IPA document
result = await executor.execute("documents/privacy_assessment.txt")
if result.success:
print("Privacy rules extracted successfully")
print(f"Generated {len(result.outputs['filters'])} filters")
if __name__ == "__main__":
asyncio.run(main())After installation, the gaspar command-line tool is available:
$ gaspar document.txt
# Process IPA document and start monitoring
$ gaspar -c config.yaml document.txt
# Use custom configuration
$ gaspar -v document.txt
# Enable verbose loggingGASPAR can be configured via YAML files. Example configuration:
# LLM Model Configuration
model:
provider: "openai"
model_name: "gpt-4"
api_key: ${OPENAI_API_KEY}
# Storage Configuration
storage:
type: "local"
local_path: "./data"
# Pipeline Configuration
pipeline:
batch_size: 100
max_retries: 3
temp_directory: "./temp"For the configuration to be used in the repo, it was adapted to use a locally deployed version of GPT-4 To use Openai API or any other LLM API provider, update openai_model.py or create a new one
Running the tests can be done using tox:
$ tox -pBuilding the packages:
$ tox -e packages
$ ls dist/Apache-2.0