Setting Up a Development Environment

Most of Presidio's services are written in Go. The presidio-analyzer module, in charge of detecting entities in text, is written in Python. This document details the required parts for developing for Presidio.

Setting up the Go environment
Setting up the Python environment
Development notes

Setting up the Go environment

Install go 1.11 and Python 3.7
Install the golang packages via dep
```
dep ensure
```
Install tesseract OCR framework. (Optional, only for Image anonymization)

Setting up the Python environment

Build and install re2 (Optional. Presidio will use regex instead of pyre2 if re2 is not installed)

re2_version="2018-12-01"
wget -O re2.tar.gz https://github.com/google/re2/archive/${re2_version}.tar.gz
mkdir re2
tar --extract --file "re2.tar.gz" --directory "re2" --strip-components 1
cd re2 && make install

Install pipenv

Pipenv is a Python workflow manager, handling dependencies and environment for python packages, it is used in the Presidio's Analyzer project as the dependencies manager

Using Pip3:
```
pip3 install --user pipenv
```
Homebrew
```
brew install pipenv
```
Additional installation instructions: https://pipenv.readthedocs.io/en/latest/install/#installing-pipenv
Create virtualenv for the project and install all requirements in the Pipfile, including dev requirements. In the presidio-analyzer folder, run:
```
pipenv install --dev --sequential --skip-lock
```

Download spacy model

pipenv run python -m spacy download en_core_web_lg

Run all tests
```
pipenv run pytest
```
To run arbitrary scripts within the virtual env, start the command with pipenv run. For example:
1. pipenv run flake8 analyzer --exclude "*pb2*.py"
2. pipenv run pylint analyzer
3. pipenv run pip freeze

Alternatively, activate the virtual environment and use the commands by starting a pipenv shell:

Start shell:
```
pipenv shell
```
Run commands in the shell
```
pytest
pylint analyzer
pip freeze
```

To use presidio-analyzer as a python library, see Installing presidio-analyzer as a standalone Python package
To add new recognizers in order to support new entities, see Adding new custom recognizers

Development notes

General notes

Installing and building the entire Presidio solution is currently not supported on Windows. However, installing and building the different docker images, or the Python package for detecting entities (presidio-analyzer) is possible on Windows. See here
Build the bins with make build
Build the base containers with make docker-build-deps DOCKER_REGISTRY=${DOCKER_REGISTRY} PRESIDIO_DEPS_LABEL=${PRESIDIO_DEPS_LABEL} (If you do not specify a valid, logged-in, registry a warning will echo to the standard output)
Build the the Docker image with make docker-build DOCKER_REGISTRY=${DOCKER_REGISTRY} PRESIDIO_DEPS_LABEL=${PRESIDIO_DEPS_LABEL} PRESIDIO_LABEL=${PRESIDIO_LABEL}
Push the Docker images with make docker-push DOCKER_REGISTRY=${DOCKER_REGISTRY} PRESIDIO_LABEL=${PRESIDIO_LABEL}
Run the tests with make test
Adding a file in go requires the make go-format command before running and building the service.
Run functional tests with make test-functional
Updating python dependencies instructions
These steps are verified on every pull request validation to a presidio branch. do not alter this document without referring to the implemented steps in the pipeline

Set the following environment variables

presidio-analyzer

GRPC_PORT: 3001 GRPC listen port

presidio-anonymizer

GRPC_PORT: 3002 GRPC listen port

presidio-api

WEB_PORT: 8080 HTTP listen port
REDIS_URL: localhost:6379, Optional: Redis address
ANALYZER_SVC_ADDRESS: localhost:3001, Analyzer address
ANONYMIZER_SVC_ADDRESS: localhost:3002, Anonymizer address

Developing only for Presidio Analyzer under Windows environment

Developing presidio as a whole on Windows is currently not supported. However, it is possible to run and test the presidio-analyzer module, in charge of detecting entities in text, on Windows using Docker:

Run locally the core services Presidio needs to operate:

docker run --rm --name test-redis --network testnetwork -d -p 6379:6379 redis
docker run --rm --name test-presidio-anonymizer --network testnetwork -d -p 3001:3001 -e GRPC_PORT=3001 mcr.microsoft.com/presidio-anonymizer:latest
docker run --rm --name test-presidio-recognizers-store --network testnetwork -d -p 3004:3004 -e GRPC_PORT=3004 -e REDIS_URL=test-redis:6379 mcr.microsoft.com/presidio-recognizers-store:latest

Navigate to <Presidio folder>/presidio-analyzer
Install the python packages if didn't do so yet:

pipenv install --dev --sequential

If you want to experiment with analyze requests, navigate into the analyzer folder and start serving the analyzer service:

pipenv run python app.py serve --grpc-port 3000

In a new pipenv shell window you can run analyze requests, for example:

pipenv run python app.py analyze --text "John Smith drivers license is AC432223" --fields "PERSON" "US_DRIVER_LICENSE" --grpc-port 3000

Load test

Edit post.lua. Change the template name

Run wrk

wrk -t2 -c2 -d30s -s post.lua http://<api-service-address>/api/v1/projects/<my-project>/analyze

Running in kubernetes

If deploying from a private registry, verify that Kubernetes has access to the Docker Registry.
If using a Kubernetes secret to manage the registry authentication, make sure it is registered under 'presidio' namespace

Further configuration

Edit charts/presidio/values.yaml to:

Setup secret name (for private registries)
Change presidio services version
Change default scale

NLP Engine Configuration

The nlp engines deployed are set on start up based on the yaml configuration files in presidio-analyzer/conf/. The default nlp engine is the large English SpaCy model (en_core_web_lg) set in default.yaml.
The format of the yaml file is as follows:

nlp_engine_name: spacy  # {spacy, stanza}
models:
  -
    lang_code: en  # code corresponds to `supported_language` in any custom recognizers
    model_name: en_core_web_lg  # the name of the SpaCy or Stanza model
  -
    lang_code: de  # more than one model is optional, just add more items
    model_name: de

By default, we call the method load_predefined_recognizers of the RecognizerRegistry class to load language specific and language agnostic recognizers.
Downloading additional engines.

SpaCy NLP Models: models download page
Stanza NLP Models: models download page

# download models - tldr
# spacy
python -m spacy download en_core_web_lg
# stanza
python -c 'import stanza; stanza.download("en");'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting Up a Development Environment

Table of contents

Setting up the Go environment

Setting up the Python environment

Using Pip3:

Homebrew

Alternatively, activate the virtual environment and use the commands by starting a pipenv shell:

Development notes

General notes

Set the following environment variables

presidio-analyzer

presidio-anonymizer

presidio-api

Developing only for Presidio Analyzer under Windows environment

Load test

Running in kubernetes

Further configuration

NLP Engine Configuration

FilesExpand file tree

development.md

Latest commit

History

development.md

File metadata and controls

Setting Up a Development Environment

Table of contents

Setting up the Go environment

Setting up the Python environment

Using Pip3:

Homebrew

Alternatively, activate the virtual environment and use the commands by starting a pipenv shell:

Development notes

General notes

Set the following environment variables

presidio-analyzer

presidio-anonymizer

presidio-api

Developing only for Presidio Analyzer under Windows environment

Load test

Running in kubernetes

Further configuration

NLP Engine Configuration