Protegrity AI Developer Edition

Welcome to the protegrity-developer-edition repository, part of the Protegrity AI Developer Edition suite. This repository provides a self-contained experimentation platform for discovering and protecting sensitive data using Protegrity’s Data Discovery, Semantic Guardrail, and Protection APIs.

🚀 Overview

This repository enables developers to:

Rapidly set up a local environment using Docker Compose.
Experiment with unstructured text classification, PII discovery, redaction, masking, and tokenization-like protection.
Experiment with semantic guardrails to secure GenAI applications using messaging risk scoring, conversation risk scoring, and PII scanning.
Integrate Protegrity APIs into GenAI and traditional applications.
Use sample applications and data to understand integration workflows.

Why This Matters

AI is transforming every industry, but privacy can’t be an afterthought. Protegrity AI Developer Edition 1.1.0 makes enterprise-grade data discovery and data protection developer-friendly, so you can build secure, privacy-first solutions for both AI pipelines and traditional data workflows. Whether you’re protecting sensitive information in analytics pipelines, business applications, or next-generation AI, Protegrity AI Developer Edition empowers you to innovate confidently while keeping data safe.

Protegrity AI Developer Edition enables secure data and AI pipelines, including:

Privacy in conversational AI: Sensitive chatbot inputs are protected before they reach generative AI models.
Prompt sanitization for LLMs: Automated PII masking reduces risk during large language model prompt engineering and inference.
Experimentation with Jupyter notebooks: Data scientists can prototype directly in Jupyter notebooks for agile experimentation.
Output redaction and leakage prevention: Detect and protect sensitive data in model outputs before returning them to end users.
Privacy-enhanced AI training: Sensitive fields in training datasets are de-identified to support compliant and secure AI development.
Synthetic data generation for privacy-preserving AI: Automatically create realistic, anonymized datasets that mimic production data without exposing sensitive information, enabling safe model training and testing.

📦 Repository Structure

.
├── CHANGELOG
├── CONTRIBUTIONS.md
├── LICENSE
├── README.md
├── docker-compose.yml                   # Orchestrates data discovery + semantic guardrail services
├── data-discovery/                      # Low-level classification examples
│   ├── sample-classification-bash-text.sh
│   ├── sample-classification-bash-tabular.sh
│   ├── sample-classification-python-text.py
│   └── sample-classification-python-tabular.py
├── semantic-guardrail/                  # GenAI security risk & PII multi-turn scanning examples
│   └── sample-guardrail-python.py
└── samples/                             # High-level SDK samples (Python & Java)
    ├── python/
    │   ├── sample-app-semantic-guardrails/  # Semantic Guardrail Jupyter Notebook samples
    │   │   ├── Sample Application.ipynb
    │   ├── sample-app-synthetic-data/       # Synthetic Data Jupyter Notebook samples
    │   │   ├── synthetic_data.ipynb
    │   ├── sample-app-find.py               # Discover and list PII entities
    │   ├── sample-app-find-and-redact.py    # Discover + redact or mask entities
    │   ├── sample-app-find-and-protect.py   # Discover + protect entities (tokenize style)
    │   ├── sample-app-find-and-unprotect.py # Unprotect protected entities
    │   └── sample-app-protection.py         # Direct protect / unprotect (CLI style)
    ├── java/                            # Java SDK samples
    │   ├── sample-app-find.sh               # Discover and list PII entities
    │   ├── sample-app-protection.sh         # Direct protect / unprotect (CLI style)
    │   ├── sample-app-find-and-protect.sh   # Discover + protect entities
    │   ├── sample-app-find-and-unprotect.sh # Unprotect protected entities
    │   └── sample-app-find-and-redact.sh    # Discover + redact entities
    ├── config.json
    └── sample-data/
        ├── input.txt
        ├── output-redact.txt            # Produced by redact workflow
        ├── output-protect.txt           # Produced by protect workflow
        └── (generated files ...)

🧰 Features

Data Discovery: REST-based classification and entity detection of unstructured text.
PII Discovery: Enumerate detected entities with confidence scores.
Redaction / Masking: Replace detected entities (configurable masking char, mapping).
Protection (Tokenization-like): Protect and unprotect specific data elements using sample-app-protection.py and combined find + protect sample.
Semantic Guardrail: Message and conversation level risk scoring + PII scanning for GenAI flows.
Synthetic Data: Synthetic Data is a privacy-enhancing technology that generates artificial data from real datasets while preserving statistical properties and relationships without exposing actual personal information.
Multi-turn Examples: Use the curl and Python samples from the semantic guardrail directory.
Configurable Samples: Adjust behavior through samples/config.json.
Cross-platform: Works on Linux, MacOS, and Windows.

🛠️ Getting Started

Prerequisites

Python >= 3.12.11 (for Python samples)

Note: Ensure that the python command on your system points to a supported python3 version, for example, Python 3.12.11. You can verify this by running python --version.
pip (for Python samples)
Python Virtual Environment (for Python samples)
Java 11 or later (for Java samples)
Maven 3.6+ or use the included Maven wrapper (for Java samples)
Container management software:
- For Linux/Windows: Docker
- For MacOS: Docker Desktop or Colima
Docker Compose > 2.30
Git
For more information about the minimum requirements, refer to Prerequisites.
Optional: If the AI Developer Edition is already installed, then complete the following tasks:
- Back up any customized files.
- Stop any AI Developer Edition containers that are running using the docker compose down --remove-orphans command.
- Remove the protegrity-developer-python module using the pip uninstall protegrity-developer-python command.

Linux and Windows users can proceed to Preparing the system.

Additional prerequisites for MacOS

MacOS requires additional steps for Docker and for systems with Apple Silicon chips. Complete the following steps before using AI Developer Edition.

Complete one of the following options to apply the settings.
- For Colima:
  1. Open a command prompt.
  2. Run the following command.
```
colima start --vm-type vz --vz-rosetta --memory 8
```
- For Docker Desktop:
  1. Open Docker Desktop.
  2. Go to Settings > General.
  3. Enable the following check boxes:
    - Use Virtualization framework
    - Use Rosetta for x86_64/amd64 emulation on Apple Silicon
  4. Click Apply & restart.
Update one of the following options for resolving certificate related errors.
- For Colima:
  1. Open a command prompt.
  2. Navigate and open the following file.
```
~/.colima/default/colima.yaml
```
  3. Update the following configuration in colima.yaml to add the path for obtaining the required images.
    
    Before update:
```
docker: {}
```
    After update:
```
docker:
    insecure-registries:
        - ghcr.io
```
  4. Save and close the file.
  5. Stop colima.
```
colima stop
```
  6. Close and start the command prompt.
  7. Start colima.
```
colima start --vm-type vz --vz-rosetta --memory 8
```
- For Docker Desktop:
  1. Open Docker Desktop.
  2. Click the gear or settings icon.
  3. Click Docker Engine from the sidebar. The editor with your current Docker daemon configuration daemon.json opens.
  4. Locate and add the insecure-registries key in the root JSON object. Ensure that you add a comma after the last value in the existing configuration.
    
    After update:
```
{
    .
    .
    <existing configuration>,
    "insecure-registries": [
        "ghcr.io",
        "githubusercontent.com"
    ]
}
```
  5. Click Apply & Restart to save the changes and restart Docker Desktop.
  6. Verify: After Docker restarts, run docker info in your terminal and confirm that the required registry is listed under Insecure Registries.
Optional: If the The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested error is displayed.
1. Start a command prompt.
2. Navigate and open the following file.
```
~/.docker/config.json
```
3. Add the following parameter.
```
"default-platform": "linux/amd64"
```
4. Save and close the file.
5. Some services are profile enabled, ensure to use the --profile flag while starting the services.
  - Run the docker compose up -d from the protegrity-developer-edition directory to start the default services.
  - Run the docker compose --profile synthetic up -d from the protegrity-developer-edition directory to start the synthetic profiled services.

Preparing the system

Complete the steps provided here to use the samples provided with AI Developer Edition.

For MacOS, ensure that the Additional prerequisites for MacOS steps are complete.

Open a command prompt.

Clone the git repository.

git clone https://github.com/Protegrity-Developer-Edition/protegrity-developer-edition.git

Navigate to the protegrity-developer-edition directory in the cloned location.
Start the services (classification + semantic guardrail + Synthetic Data [with profile]) in background. The dependent containers are large; downloads may take time.
- To start the Data Discovery and Semantic Guardrail services, run:
```
docker compose up -d
```
- To start the Data Discovery, Semantic Guardrail, and Synthetic Data services, run:
```
docker compose --profile synthetic up -d
```
Based on your configuration use the docker-compose up -d command.
Install the protegrity-developer-python module.

Note: It is recommended to install and activate the Python virtual environment before installing the module.
```
pip install protegrity-developer-python
```
The installation completes and the success message is displayed. Alternatively, to build the module from source, refer to Building the Python module from source.
For Java samples, the protegrity-developer-java module is automatically downloaded from Maven Central when you run a sample for the first time. Alternatively, to build the java library from source, refer to Building the Java module from source.
Install Jupyter Lab to run the notebook samples provided for Semantic Guardrail and Synthetic Data.

Note: It is recommended to install and activate the Python virtual environment.
```
pip install -r samples/python/requirements.txt
```

Running the Sample Applications

The quick runs for each sample is provided here. Open a command prompt and run the command from the repository root. Ensure the Getting Started steps are completed first. For more information about running the application, refer to the Running the sample application section.

Note: Both Python and Java samples are available. Python samples are located in samples/python/ and Java samples in samples/java/. Choose the language that best fits your project needs.

1. Discover PII

List the PII entities.

Python:

python samples/python/sample-app-find.py

Java:

bash samples/java/sample-app-find.sh

The logs list discovered entities as JSON. No modification of file contents is performed.

2. Find and Redact

Find and redact or mask information using the default settings. Redaction and masking is controlled using the method, that is redact or mask, and masking_char in the samples/config.json file.

Python:

python samples/python/sample-app-find-and-redact.py

Java:

bash samples/java/sample-app-find-and-redact.sh

This produces the samples/sample-data/output-redact.txt file with entities redacted, that is removed, or masked.

3. Running Data Discovery on tabular data

The sample-classification-python-tabular analyzes text from the data-discovery/input.csv file.

cd data-discovery
python sample-classification-python-tabular.py

4. Semantic Guardrail using Python

Run the sample using Python.

python semantic-guardrail/sample-guardrail-python.py

This submits a multi-turn conversation with semantic and performs PII processing.

5. Semantic Guardrail using Jupyter Notebook

Note: It is recommended to install and activate the Python virtual environment.

Run the following command to start Jupyter Lab for running Semantic Guardrail.
```
jupyter lab
```
Copy the URL displayed and navigate to the site from a web browser. Ensure that localhost is replaced with the IP address of the system where the AI Developer Edition is set up.
In the left pane of the notebook, navigate to samples/python/sample-app-semantic-guardrails.
Open the Sample Application.ipynb file.
Click the Play icon and follow the prompts in the notebook.

6. Synthetic Data using Jupyter Notebook

A Jupyter Notebook is provided for using Protegrity Synthetic Data.

Note: It is recommended to install and activate the Python virtual environment.

Start Jupyter Lab using the following command.
```
jupyter lab
```
The Jupyter lab starts and a URL with a token is displayed.
Copy the URL displayed and navigate to the site from a web browser. Ensure that localhost is replaced with the IP address of the system where the AI Developer Edition is set up.
In the left pane of the notebook, navigate to samples/python/sample-app-synthetic-data.
Open the synthetic_data.ipynb file.
Click the Play icon and follow the steps in the notebook to explore the synthetic data capabilities.

7. Setting the environment variables

The next steps has samples that demonstrate how to protect and unprotect data using the Protection APIs. The Protection APIs rely on authenticated access to the AI Developer Edition API Service.

samples/python/sample-app-find-and-protect.py
samples/python/sample-app-protection.py
samples/python/sample-app-find-and-unprotect.py
samples/java/sample-app-find-and-protect.sh
samples/java/sample-app-protection.sh
samples/java/sample-app-find-and-unprotect.sh

Perform the steps from Additional settings for using the AI Developer Edition API Service to obtain the API key and password for setting the environment variables. If you already have the API key and password, then proceed to export the environment variables.

For Linux and MacOS:

export DEV_EDITION_EMAIL='<Email_used_for_registration>'

export DEV_EDITION_PASSWORD='<Password_provided_in_email>'

export DEV_EDITION_API_KEY='<API_key_provided_in_email>'

Verify that the variables are set.

test -n "$DEV_EDITION_EMAIL" && echo "EMAIL $DEV_EDITION_EMAIL set" || echo "EMAIL missing"
test -n "$DEV_EDITION_PASSWORD" && echo "PASSWORD $DEV_EDITION_PASSWORD set" || echo "PASSWORD missing"
test -n "$DEV_EDITION_API_KEY" && echo "API KEY $DEV_EDITION_API_KEY set" || echo "API KEY missing"

For Windows PowerShell:

$env:DEV_EDITION_EMAIL = '<Email_used_for_registration>'

$env:DEV_EDITION_PASSWORD = '<Password_provided_in_email>'

$env:DEV_EDITION_API_KEY = '<API_key_provided_in_email>'

Verify that the variables are set

if ($env:DEV_EDITION_EMAIL) { Write-Output "EMAIL $env:DEV_EDITION_EMAIL set"} else { Write-Output "EMAIL missing"} 
if ($env:DEV_EDITION_PASSWORD) { Write-Output "PASSWORD $env:DEV_EDITION_PASSWORD set" } else { Write-Output "PASSWORD missing" } 
if ($env:DEV_EDITION_API_KEY) { Write-Output "API KEY $env:DEV_EDITION_API_KEY set" } else { Write-Output "API KEY missing" }

8. Find and Protect

Ensure that the environment variables are exported and then run the sample code.

Python:

python samples/python/sample-app-find-and-protect.py

Java:

bash samples/java/sample-app-find-and-protect.sh

This produces the samples/sample-data/output-protect.txt file with protected, this is tokenized-like, values.

To get original data run:

Python:

python samples/python/sample-app-find-and-unprotect.py

Java:

bash samples/java/sample-app-find-and-unprotect.sh

This reads the samples/sample-data/output-protect.txt file and produces the samples/sample-data/output-unprotect.txt file with original values.

9. Direct Protect and Unprotect from the CLI

Use the sample commands below to protect and unprotect data. Ensure that the environment variables are exported and then run the sample code.

For information about the users, roles, and data elements, refer to Understanding Users and Roles and Understanding the Data Elements

Python:

# protect
python samples/python/sample-app-protection.py --input_data "John Smith" --policy_user superuser --data_element name --protect

# unprotect
python samples/python/sample-app-protection.py --input_data "<protected_data>" --policy_user superuser --data_element name --unprotect

Java:

# protect
bash samples/java/sample-app-protection.sh --input_data "John Smith" --policy_user superuser --data_element name --protect

# unprotect
bash samples/java/sample-app-protection.sh --input_data "<protected_data>" --policy_user superuser --data_element name --unprotect

The <protected_data> value is obtained from the output of the protect command.

Similarly, to encrypt and decrypt data, run the following commands:

Python:

# encrypt
python samples/python/sample-app-protection.py --input_data "John Smith" --policy_user superuser --data_element text --enc

# decrypt
python samples/python/sample-app-protection.py --input_data "<encrypted_data>" --policy_user superuser --data_element text --dec

Java:

# encrypt
bash samples/java/sample-app-protection.sh --input_data "John Smith" --policy_user superuser --data_element text --enc

# decrypt
bash samples/java/sample-app-protection.sh --input_data "<encrypted_data>" --policy_user superuser --data_element text --dec

The <encrypted_data> value is obtained from the output of the encrypt command.

For more information, run the help command:

Python:

python samples/python/sample-app-protection.py --help

Java:

bash samples/java/sample-app-protection.sh --help

Additional settings for using the AI Developer Edition API Service

Prior registration is required to obtain credentials for accessing the AI Developer Edition API Service. The following samples demonstrate how to protect and unprotect data using the Protection APIs. The Protection APIs rely on authenticated access to the AI Developer Edition API Service.

samples/python/sample-app-find-and-protect.py
samples/python/sample-app-protection.py
samples/python/sample-app-find-and-unprotect.py
samples/java/sample-app-find-and-protect.sh
samples/java/sample-app-protection.sh
samples/java/sample-app-find-and-unprotect.sh

Open a web browser.
Navigate to https://www.protegrity.com/developers/get-api-credentials .
Specify the following details:
- First Name
- Last Name
- Work Email
- Job Title
- Company Name
- Country
Click the Terms & Conditions link and read the terms and conditions.
Select the check box to accept the terms and conditions. The request is analyzed. After the request is approved, an API key and password to access the AI Developer Edition API Service is sent to the Work Email specified. Keep the API key and password safe. You need to export them to environment variables for using the AI Developer Edition API Service.

Note: After completing registration, allow 1-2 minutes for the confirmation email to arrive. If you do not see it in your inbox, check your spam or junk folder before retrying.

📄 Configuration

Edit samples/config.json to customize SDK behavior. Keys:

named_entity_map: Optional mappings (friendly labels) used during redact/mask. Supported Classification Entities
method: redact (remove) or mask (replace with masking char).
masking_char: Character for masking (when method = mask).
classification_score_threshold: Minimum confidence (default 0.6 if omitted).
endpoint_url: Override classification endpoint (defaults internally to docker compose service http://localhost:8580/...).
enable_logging, log_level.

Current example:

{
    "masking_char": "#",
    "named_entity_map": {
        "PERSON": "PERSON",
        "LOCATION": "LOCATION",
        "SOCIAL_SECURITY_ID": "SSN",
        "PHONE_NUMBER": "PHONE",
        "AGE": "AGE",
        "USERNAME": "USERNAME"
    },
    "method": "redact"
}

Service Endpoints (default using docker compose)

Classification API: http://localhost:${CLASSIFICATION_PORT:-8580}/pty/data-discovery/v1.1/classify
Semantic Guardrail API: http://localhost:${SGR_PORT:-8581}/pty/semantic-guardrail/v1.1/conversations/messages/scan
Synthetic Data API: http://localhost:${SYNTHETIC_DATA_PORT:-8095}/pty/synthetic-data/v1

If you change published ports in docker-compose.yml, update endpoint_url. Also, if required, update the semantic guardrail URL in the scripts.

Docker Compose Services

docker-compose.yml provisions:

pattern-provider-service and context-provider-service: ML provider backends.
classification-service: Exposes Data Discovery REST API. Uses port 8580 by default.
semantic-guardrail-service: Conversation risk and PII scanning depends on classification. Uses port 8581 by default.
synthetic-data-service: Synthetic Data service (--profile synthetic). Uses port 8095 by default.

Restart stack after changes to docker-compose.yml file from protegrity-developer-edition directory:

docker compose down && docker compose up -d

Check service logs for any errors from protegrity-developer-edition directory:

docker compose logs

📚 Documentation

The Protegrity AI Developer Edition documentation is available at https://developer.docs.protegrity.com/.
For more API reference and tutorials, refer to the Developer Portal at https://www.protegrity.com/developers.
For more information about Data Discovery, refer to the Data Discovery documentation.
For more information about Semantic Guardrails, refer to the Semantic Guardrails documentation.
For more information about Synthetic Data, refer to the Synthetic Data documentation.
For more information about Application Protector Python, refer to the Application Protector Python documentation.
For more information about Application Protector Java, refer to the Application Protector Java documentation.

📢 Community & Support

Join the discussion on https://github.com/orgs/Protegrity-Developer-Edition/discussions.
Anonymous downloads supported; registration required for participation.
Issues / feature requests: please include sample script name & log snippet.

📜 License

See LICENSE for terms and conditions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Protegrity AI Developer Edition

🚀 Overview

Quick Links

📦 Repository Structure

🧰 Features

🛠️ Getting Started

Prerequisites

Additional prerequisites for MacOS

Preparing the system

Running the Sample Applications

1. Discover PII

2. Find and Redact

3. Running Data Discovery on tabular data

4. Semantic Guardrail using Python

5. Semantic Guardrail using Jupyter Notebook

6. Synthetic Data using Jupyter Notebook

7. Setting the environment variables

8. Find and Protect

9. Direct Protect and Unprotect from the CLI

Additional settings for using the AI Developer Edition API Service

📄 Configuration

Service Endpoints (default using docker compose)

Docker Compose Services

📚 Documentation

📢 Community & Support

📜 License

About

Uh oh!

Releases 3

Uh oh!

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data-discovery		data-discovery
samples		samples
semantic-guardrail		semantic-guardrail
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTIONS.md		CONTRIBUTIONS.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

License

Protegrity-Developer-Edition/protegrity-developer-edition

Folders and files

Latest commit

History

Repository files navigation

Protegrity AI Developer Edition

🚀 Overview

Quick Links

📦 Repository Structure

🧰 Features

🛠️ Getting Started

Prerequisites

Additional prerequisites for MacOS

Preparing the system

Running the Sample Applications

1. Discover PII

2. Find and Redact

3. Running Data Discovery on tabular data

4. Semantic Guardrail using Python

5. Semantic Guardrail using Jupyter Notebook

6. Synthetic Data using Jupyter Notebook

7. Setting the environment variables

8. Find and Protect

9. Direct Protect and Unprotect from the CLI

Additional settings for using the AI Developer Edition API Service

📄 Configuration

Service Endpoints (default using docker compose)

Docker Compose Services

📚 Documentation

📢 Community & Support

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Uh oh!

Contributors 3

Languages