Large Language Models (LLMs) can be powerful but often come with high costs and are closed-source, making deployment and iteration challenging. An open-source, Small Language Model (SLM) is being considered as an alternative for a customer service chatbot in the e-commerce/retail space. As a consulting AI engineer, your task is to take over the initial project, debug and improve the application’s safety guardrails, and evaluate the new LLM’s effectiveness and reliability using the provided codebase and setup.
The application provides a customer service chatbot powered by a locally-hosted SLM (SmolLM2 135M) with safety guardrails to prevent:
- Prompt injection attacks
- Policy-violating content (harmful/illegal, personal data requests, off-topic queries)
- Unsafe outputs
- Docker and Docker Compose
- Docker Desktop or colima
- Python 3.11 or higher
- uv package manager (or pip)
- At least 8GB RAM available for running Ollama models
Start the docker deamon:
- Open Docker Desktop
or
colima startRun the automated setup script to configure everything in one go:
./setup.shNote: Initial model downloads may take 3-7 minutes depending on your connection.
Run the test suite:
pytest tests/test_app.py -vImplement the similarity score calculation logic.
Run the evaluation script:
python evaluation/evaluate.py --sample-size 20After fixing the bugs, discuss and optionally implement:
-
Defense-in-depth strategies:
- How would you improve the pre-filtering?
- What additional output moderation would you add?
- How would you handle edge cases?
-
Monitoring and logging:
- What metrics would you track?
- How would you detect new attack patterns?
- What alerts would you set up?
-
Trade-offs:
- False positives vs false negatives
- Latency vs thoroughness
- Explainability vs accuracy
Environment variables (see app/config.py):
OLLAMA_BASE_URL: Ollama API endpoint (default:http://localhost:11434)LLM_MODEL: LLM model name (default:smollm2:135m)EMBEDDING_MODEL: Embedding model name (default:nomic-embed-text)APP_HOST: Application host (default:0.0.0.0)APP_PORT: Application port (default:8000)PHOENIX_ENDPOINT: Phoenix observability endpoint (default:http://localhost:6006)PHOENIX_ENABLED: Enable Phoenix tracing (default:false)
Stop all services:
docker-compose downRemove volumes (delete downloaded models):
docker-compose down -vStop docker deamon:
colima stop