teseo-data

Repository per la gestione e l'analisi dei dati del progetto Teseo (PA digitale 2026).

Requisiti

Bun (per gli script TypeScript)
Python 3.11+ (per l'anonymizer)

bun install
pip install -r scripts/anonymizer/requirements.txt
python -m spacy download it_core_news_lg
python -m spacy download en_core_web_lg

Scripts

1. Anonymizer (`scripts/anonymizer/presidio.py`)

Anonimizza i case di assistenza rimuovendo dati personali (PII) e filtrando contenuti non utili per la knowledge base.

Pipeline:

Presidio - Rileva e sostituisce PII (nomi, email, CF, IBAN, telefoni, indirizzi)
Denylist - Filtra risposte con frasi generiche (es. "attendere", "in lavorazione")
AI (OVH) - Valida se il contenuto è utile + trova PII mancanti

Input:

scripts/anonymizer/input/case.csv

Output:

data/anonymized/case_anonymized_YYYY_MM_DD.csv  # File giornaliero
data/anonymized/output_case.csv                  # File master (append, no duplicati)

Esecuzione:

# Processa tutto
python3 scripts/anonymizer/presidio.py

# Test con N righe
python3 scripts/anonymizer/presidio.py --test 50

GitHub Action: .github/workflows/anonymizer.yml (manual trigger)

2. Fetch Langfuse (`scripts/fetch-langfuse/`)

Recupera le domande dal chatbot (via Langfuse) e le classifica in categorie predefinite.

Categorie: Definite in scripts/fetch-langfuse/config.ts

Output:

data/categories_stats_YYYY_MM_DD.csv  # Stats giornaliere
data/categories_stats.csv             # Aggregato totale
data/uncategorized_questions.csv      # Domande "Altro"

Esecuzione:

bun run scripts/fetch-langfuse/index.ts

GitHub Action: .github/workflows/langfuse.yml

3. Fetch Langfuse Experimental (`scripts/fetch-langfuse-exp.ts`)

Come fetch-langfuse, ma l'AI genera autonomamente le categorie analizzando semanticamente le domande.

Output:

data/exp/categories_stats_YYYY_MM_DD.csv  # Stats giornaliere (categorie auto-generate)
data/exp/categories_stats.csv             # Top 10 categorie aggregate

Esecuzione:

bun run scripts/fetch-langfuse-exp.ts

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
.github/workflows		.github/workflows
data		data
scripts		scripts
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
bun.lock		bun.lock
index.ts		index.ts
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

teseo-data

Requisiti

Scripts

1. Anonymizer (`scripts/anonymizer/presidio.py`)

2. Fetch Langfuse (`scripts/fetch-langfuse/`)

3. Fetch Langfuse Experimental (`scripts/fetch-langfuse-exp.ts`)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

teamdigitale/teseo-data

Folders and files

Latest commit

History

Repository files navigation

teseo-data

Requisiti

Scripts

1. Anonymizer (scripts/anonymizer/presidio.py)

2. Fetch Langfuse (scripts/fetch-langfuse/)

3. Fetch Langfuse Experimental (scripts/fetch-langfuse-exp.ts)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

1. Anonymizer (`scripts/anonymizer/presidio.py`)

2. Fetch Langfuse (`scripts/fetch-langfuse/`)

3. Fetch Langfuse Experimental (`scripts/fetch-langfuse-exp.ts`)

Packages