|
2 | 2 |
|
3 | 3 | ## ✨ Overview |
4 | 4 |
|
5 | | -`llm_router_services` provides **HTTP services** that implement the core functionality used by the LLM‑Router’s plugin |
6 | | -system. |
7 | | -The services expose guardrail and masking capabilities through Flask applications |
8 | | -that can be called by the corresponding plugins in `llm_router_plugins`. |
| 5 | +`llm_router_services` delivers **HTTP services** that power the LLM‑Router plugin ecosystem. |
| 6 | +All functionality (guard‑rails, maskers, …) is exposed through **one Flask application** that can be started with a |
| 7 | +single command or via Gunicorn. |
9 | 8 |
|
10 | | -Key components: |
| 9 | +| Sub‑package | Purpose | |
| 10 | +|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------| |
| 11 | +| **guardrails/** | Safety‑checking services (NASK‑PIB, Sojka) and a dynamic router (`router.py`) that registers only the endpoints whose environment flag is enabled. | |
| 12 | +| **maskers/** | Prototype **BANonymizer** – a token‑classification based anonymiser (still under development). | |
| 13 | +| **run_servcices.sh** | Helper script that launches the unified API with Gunicorn, wiring all required environment variables. | |
| 14 | +| **requirements.txt** | Heavy dependencies (e.g. `transformers`) needed for GPU‑accelerated inference. | |
11 | 15 |
|
12 | | -| Sub‑package | Primary purpose | |
13 | | -|--------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
14 | | -| **guardrails/** | Hosts the NASK‑PIB guardrail service (`nask_pib_guard_app.py`). It receives a JSON payload, chunks the text, runs a Hugging‑Face classification pipeline, and returns a safety verdict (`safe` flag + detailed per‑chunk results). | |
15 | | -| **maskers/** | Contains the **BANonymizer** (`banonymizer.py` -- **under development**) – a lightweight Flask service that performs token‑classification based anonymisation of input text. | |
16 | | -| **run_*.sh** scripts | Convenience wrappers to start the services (Gunicorn for the guardrail, plain Flask for the anonymiser). | |
17 | | -| **requirements‑gpu.txt** | Lists heavy dependencies (e.g., `transformers`) required for GPU‑accelerated inference. | |
| 16 | +All services are **stateless** – models are loaded once at start‑up and then serve requests over HTTP. |
18 | 17 |
|
19 | | -The services are **stateless**; they load their models once at start‑up and then serve requests over HTTP. |
| 18 | +--- |
| 19 | + |
| 20 | +## 🚀 Quick start |
| 21 | + |
| 22 | +### 1. Install the package |
| 23 | + |
| 24 | +```shell script |
| 25 | +git clone https://github.com/radlab-dev-group/llm-router-services.git |
| 26 | + |
| 27 | +cd llm-router-services |
| 28 | +python -m venv .venv |
| 29 | +source .venv/bin/activate |
| 30 | +pip install -r requirements.txt |
| 31 | + |
| 32 | +# editable install of the package itself |
| 33 | +pip install -e . |
| 34 | +``` |
| 35 | + |
| 36 | +> **Tip:** The package requires Python ≥ 3.8 (tested on >= 3.10.6). |
| 37 | +
|
| 38 | +### 2. Set environment variables |
| 39 | + |
| 40 | +Only services whose `*_ENABLED` flag is set to `1` (or `true`) will be exposed. |
| 41 | + |
| 42 | +```shell script |
| 43 | +export LLM_ROUTER_API_HOST=0.0.0.0 |
| 44 | +export LLM_ROUTER_API_PORT=5000 |
| 45 | + |
| 46 | +# Enable NASK‑PIB Guard |
| 47 | +export LLM_ROUTER_NASK_PIB_GUARD_ENABLED=1 |
| 48 | +export LLM_ROUTER_NASK_PIB_GUARD_MODEL_PATH=NASK-PIB/Herbert-PL-Guard |
| 49 | +# -1 = CPU, 0/1 = CUDA device index |
| 50 | +export LLM_ROUTER_NASK_PIB_GUARD_DEVICE=-1 |
| 51 | + |
| 52 | +# Enable Sojka Guard |
| 53 | +export LLM_ROUTER_SOJKA_GUARD_ENABLED=1 |
| 54 | +export LLM_ROUTER_SOJKA_GUARD_MODEL_PATH=speakleash/Bielik-Guard-0.1B-v1.0 |
| 55 | +# -1 = CPU, 0/1 = CUDA device index |
| 56 | +export LLM_ROUTER_SOJKA_GUARD_DEVICE=-1 |
| 57 | +``` |
| 58 | + |
| 59 | +### 3. Run the service |
| 60 | + |
| 61 | +#### Option A – via the helper script (recommended) |
| 62 | + |
| 63 | +```shell script |
| 64 | +./run_servcices.sh |
| 65 | +``` |
| 66 | + |
| 67 | +The script starts **Gunicorn** with the Flask app created by `llm_router_services.router:create_app()`. |
| 68 | + |
| 69 | +#### Option B – directly with Python |
| 70 | + |
| 71 | +```shell script |
| 72 | +python -m llm_router_services.router |
| 73 | +``` |
| 74 | + |
| 75 | +Both commands bind to `0.0.0.0:5000` (or the values you supplied). |
20 | 76 |
|
21 | 77 | --- |
22 | 78 |
|
23 | | -## 🛡️ Guardrails |
| 79 | +## 📡 API reference |
| 80 | + |
| 81 | +All endpoints are mounted under `/api/guardrails/` (guard‑rails) or `/api/maskers/` (maskers). |
| 82 | + |
| 83 | +| Service | Model | Endpoint | Method | Description | |
| 84 | +|-----------------------------------------------|-------------------------------------|-------------------------------|--------|--------------------------------------------------------------------------------------------------------------------------------| |
| 85 | +| **NASK‑PIB Guard** | `NASK‑PIB/Herbert-PL-Guard` | `/api/guardrails/nask_guard` | `POST` | Polish safety classifier (hate, violence, etc.). Returns `safe: bool` and per‑chunk classification details. | |
| 86 | +| **Sojka Guard** | `speakleash/Bielik-Guard-0.1B-v1.0` | `/api/guardrails/sojka_guard` | `POST` | Multi‑category Polish safety model (HATE, VULGAR, SEX, CRIME, SELF‑HARM). Returns per‑category scores and overall `safe` flag. | |
| 87 | +| **BANonymizer** *(masker, under development)* | – | `/api/maskers/banonymizer` | `POST` | Token‑classification based anonymiser that redacts personal data from the supplied text. | |
24 | 88 |
|
25 | | -Full documentation for the guardrails sub‑package is available |
26 | | -in [guardrail-readme](llm_router_services/guardrails/README.md). |
| 89 | +### Request payload |
27 | 90 |
|
28 | | -The **guardrail** sub‑package implements safety‑checking services that can be queried via HTTP: |
| 91 | +```json |
| 92 | +{ |
| 93 | + "payload": "Your input string here (must be longer than 8 characters to be processed)." |
| 94 | +} |
| 95 | +``` |
29 | 96 |
|
30 | | -| Service | Model | Endpoint | Description | |
31 | | -|--------------------------|-------------------------------------|------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------| |
32 | | -| **NASK‑PIB Guard** | `NASK‑PIB/HerBERT‑PL‑Guard` | `POST /api/guardrails/nask_guard` | Polish‑language safety classifier detecting unsafe content (e.g., hate, violence). Returns a `safe` flag and per‑chunk classification details. | |
33 | | -| **Sojka Guard** | `speakleash/Bielik‑Guard‑0.1B‑v1.0` | `POST /api/guardrails/sojka_guard` | Multi‑category Polish safety model (HATE, VULGAR, SEX, CRIME, SELF‑HARM). Returns detailed scores per category and an overall `safe` flag. | |
34 | | -| **BANonymizer** (masker) | **under development** | `POST /api/maskers/banonymizer` | Token‑classification based anonymiser that redacts personal data from input text. | |
| 97 | +or more complicated: |
35 | 98 |
|
36 | | -### How to use |
| 99 | +```json |
| 100 | +{ |
| 101 | + "payload": { |
| 102 | + "Jak zrobić bombę": [ |
| 103 | + "zaopatrzyć się w carnuchów, co będą robić czarny proch" |
| 104 | + ], |
| 105 | + "texts": [ |
| 106 | + "To jest normalna wiadomość bez problemów.", |
| 107 | + "Chcę się zabić i skończyć z tym wszystkim.", |
| 108 | + "Dlaczego hipisi są tacy brudni i głupi", |
| 109 | + "K*rwa, to jest ch*jowe rozwiązanie.", |
| 110 | + "Opowiem ci o seksie i intymności.", |
| 111 | + "Jak ukraść samochód i nie zostać złapanym." |
| 112 | + ] |
| 113 | + } |
| 114 | +} |
| 115 | +``` |
37 | 116 |
|
38 | | -1. **Start the service** – run the provided shell script (`run_*_guardrail.sh` or `run_*_masker.sh`) or invoke the Flask |
39 | | - module directly (e.g., `python -m llm_router_services.guardrails.speakleash.sojka_guard_app`). |
40 | | -2. **Send a JSON payload** – the request body must be a JSON object; any string fields longer than 8 characters are |
41 | | - extracted and classified. |
42 | | -3. **Interpret the response** – the top‑level `safe` boolean indicates the overall verdict, while `detailed` provides |
43 | | - per‑chunk (or per‑category) results with confidence scores. |
| 117 | +### Example `curl` call |
44 | 118 |
|
45 | | -### Configuration |
| 119 | +```shell script |
| 120 | +curl -X POST http://localhost:5000/api/guardrails/nask_guard \ |
| 121 | + -H "Content-Type: application/json" \ |
| 122 | + -d '{"text":"Przykładowy tekst do analizy"}' |
| 123 | +``` |
46 | 124 |
|
47 | | -All guardrail services read configuration from environment variables prefixed with: |
| 125 | +### Response shape (guard‑rails) |
48 | 126 |
|
49 | | -* `LLM_ROUTER_NASK_PIB_GUARD_` – for the NASK‑PIB guardrail. |
50 | | -* `LLM_ROUTER_SOJKA_GUARD_` – for the Sojka guardrail. |
51 | | -* `LLM_ROUTER_BANONYMIZER_` – for the masker. |
| 127 | +```json |
| 128 | +{ |
| 129 | + "results": { |
| 130 | + "detailed": [ |
| 131 | + { |
| 132 | + "chunk_index": 0, |
| 133 | + "chunk_text": "Jak zrobić bombę", |
| 134 | + "label": "crime", |
| 135 | + "safe": false, |
| 136 | + "score": 0.9211 |
| 137 | + }, |
| 138 | + { |
| 139 | + "chunk_index": 1, |
| 140 | + "chunk_text": "zaopatrzyć się w carnuchów, co będą robić czarny proch", |
| 141 | + "label": "crime", |
| 142 | + "safe": false, |
| 143 | + "score": 0.9541 |
| 144 | + }, |
| 145 | + { |
| 146 | + "chunk_index": 2, |
| 147 | + "chunk_text": "To jest normalna wiadomość bez problemów.", |
| 148 | + "label": "sex", |
| 149 | + "safe": false, |
| 150 | + "score": 0.0309 |
| 151 | + }, |
| 152 | + { |
| 153 | + "chunk_index": 3, |
| 154 | + "chunk_text": "Chcę się zabić i skończyć z tym wszystkim.", |
| 155 | + "label": "self-harm", |
| 156 | + "safe": false, |
| 157 | + "score": 0.8993 |
| 158 | + }, |
| 159 | + { |
| 160 | + "chunk_index": 4, |
| 161 | + "chunk_text": "Dlaczego hipisi są tacy brudni i głupi", |
| 162 | + "label": "hate", |
| 163 | + "safe": false, |
| 164 | + "score": 0.7091 |
| 165 | + }, |
| 166 | + { |
| 167 | + "chunk_index": 5, |
| 168 | + "chunk_text": "Krwa, to jest chjowe rozwiązanie.", |
| 169 | + "label": "vulgar", |
| 170 | + "safe": false, |
| 171 | + "score": 0.8618 |
| 172 | + }, |
| 173 | + { |
| 174 | + "chunk_index": 6, |
| 175 | + "chunk_text": "Opowiem ci o seksie i intymności.", |
| 176 | + "label": "sex", |
| 177 | + "safe": false, |
| 178 | + "score": 0.7567 |
| 179 | + }, |
| 180 | + { |
| 181 | + "chunk_index": 7, |
| 182 | + "chunk_text": "Jak ukraść samochód i nie zostać złapanym.", |
| 183 | + "label": "crime", |
| 184 | + "safe": false, |
| 185 | + "score": 0.918 |
| 186 | + } |
| 187 | + ], |
| 188 | + "safe": false |
| 189 | + } |
| 190 | +} |
| 191 | +``` |
52 | 192 |
|
53 | | -Key variables include: |
| 193 | +--- |
| 194 | + |
| 195 | +## ⚙️ Configuration (environment variables) |
| 196 | + |
| 197 | +| Variable | Description | Default | |
| 198 | +|----------------------------------------|---------------------------------------------------------------------|-----------| |
| 199 | +| `LLM_ROUTER_API_HOST` | Host address for the Flask app | `0.0.0.0` | |
| 200 | +| `LLM_ROUTER_API_PORT` | Port for the Flask app | `5000` | |
| 201 | +| `LLM_ROUTER_NASK_PIB_GUARD_ENABLED` | `1` → expose NASK‑PIB endpoint | `0` | |
| 202 | +| `LLM_ROUTER_NASK_PIB_GUARD_MODEL_PATH` | HF hub ID or local path for the NASK model | – | |
| 203 | +| `LLM_ROUTER_NASK_PIB_GUARD_DEVICE` | `-1` = CPU, `0`/`1` … = CUDA device index | `-1` | |
| 204 | +| `LLM_ROUTER_SOJKA_GUARD_ENABLED` | `1` → expose Sojka endpoint | `1` | |
| 205 | +| `LLM_ROUTER_SOJKA_GUARD_MODEL_PATH` | HF hub ID or local path for the Sojka model | – | |
| 206 | +| `LLM_ROUTER_SOJKA_GUARD_DEVICE` | Same semantics as above | `-1` | |
| 207 | +| `LLM_ROUTER_BANONYMIZER_…` | Future variables for the BANonymizer (e.g., `MODEL_PATH`, `DEVICE`) | – | |
| 208 | + |
| 209 | +You can also set these variables inline when invoking the script, e.g.: |
| 210 | + |
| 211 | +```shell script |
| 212 | +LLM_ROUTER_SOJKA_GUARD_ENABLED=0 ./run_servcices.sh |
| 213 | +``` |
| 214 | + |
| 215 | +--- |
| 216 | + |
| 217 | +## 🛠️ Extending the router |
| 218 | + |
| 219 | +The router is deliberately **plug‑and‑play**. To add a new guard‑rail: |
| 220 | + |
| 221 | +1. **Create a model wrapper** that inherits from `GuardrailBase` (or reuse `TextClassificationGuardrail`). |
| 222 | +2. **Provide a config** (`GuardrailModelConfig`) containing model‑specific thresholds. |
| 223 | +3. **Add a `register_routes(app)` function** in a new module (e.g., `my_new_guard.py`) that builds the guard‑rail |
| 224 | + instance and registers its Flask route. |
| 225 | +4. **Update the registry** in `llm_router_services/router.py`: |
| 226 | + |
| 227 | +```python |
| 228 | +_SERVICE_REGISTRY.append({ |
| 229 | + "module": "llm_router_services.guardrails.my_new_guard", |
| 230 | + "env": "LLM_ROUTER_MY_NEW_GUARD_ENABLED", |
| 231 | +}) |
| 232 | +``` |
| 233 | + |
| 234 | +5. **Expose a new env‑var** (`LLM_ROUTER_MY_NEW_GUARD_ENABLED`) to toggle the service. |
| 235 | + |
| 236 | +No changes to the core router logic are required – the new endpoint appears automatically when the flag is set to `1`. |
| 237 | + |
| 238 | +--- |
| 239 | + |
| 240 | +## 🧪 Development & testing |
| 241 | + |
| 242 | +| Task | Command | |
| 243 | +|-------------------------|---------------------------------------------------| |
| 244 | +| Run unit tests (if any) | `pytest` | |
| 245 | +| Check code style | `autopep8 --diff . && pylint llm_router_services` | |
| 246 | +| Re‑build the package | `python setup.py sdist bdist_wheel` | |
| 247 | +| Clean generated files | `git clean -fdX` | |
| 248 | + |
| 249 | +> **Note:** The repository currently contains only a minimal test suite. Feel free to add more tests under a `tests/` |
| 250 | +> directory. |
| 251 | +
|
| 252 | +--- |
54 | 253 |
|
55 | | -* `MODEL_PATH` – path or Hugging‑Face hub identifier of the model. |
56 | | -* `DEVICE` – `-1` for CPU or CUDA device index for GPU inference. |
57 | | -* `FLASK_HOST` / `FLASK_PORT` – network binding for the Flask server. |
| 254 | +## 📦 Installation as a package |
58 | 255 |
|
59 | | -### Extensibility |
| 256 | +If you want to install the library from a remote repository or a local wheel: |
60 | 257 |
|
61 | | -The guardrail architecture is built around the **`GuardrailBase`** abstract class and a **factory** ( |
62 | | -`GuardrailClassifierModelFactory`). To add a new safety model: |
| 258 | +```shell script |
| 259 | +pip install git+https://github.com/your-org/llm_router_services.git |
| 260 | +# or, after building: |
| 261 | +pip install dist/llm_router_services-0.0.2-py3-none-any.whl |
| 262 | +``` |
63 | 263 |
|
64 | | -1. Implement a concrete subclass of `GuardrailBase` (or reuse `TextClassificationGuardrail`). |
65 | | -2. Provide a `GuardrailModelConfig` implementation with model‑specific thresholds. |
66 | | -3. Register the model type in the factory if a new identifier is required. |
| 264 | +The package registers the entry point `llm_router_services.router:create_app` which can be used by any WSGI server ( |
| 265 | +Gunicorn, uWSGI, etc.). |
67 | 266 |
|
68 | 267 | --- |
69 | 268 |
|
70 | 269 | ## 📜 License |
71 | 270 |
|
72 | | -See the [LICENSE](LICENSE) file. |
| 271 | +`llm_router_services` is released under the **Apache License 2.0**. See the full text in the [LICENSE](LICENSE) file. |
73 | 272 |
|
74 | 273 | --- |
75 | 274 |
|
76 | | -*Happy masking and safe routing!* |
| 275 | +*Happy masking and safe routing!* 🎉 |
0 commit comments