Catch emerging research trends in seconds. Type a question, NicheExplorer fetches the latest papers & discussions, clusters them into semantic topics and presents an interactive report.
🌐 Demo Video: https://www.youtube.com/watch?v=JZuyDbpnB-A
The GIF below shows a complete front-end flow: entering a query, running the pipeline, and browsing the discovered topics.
git clone https://github.com/your-org/team-dev_ops.git
cd team-dev_ops
Configure environment
cp .env.example .envbash api/scripts/gen-all.sh docker compose --env-file ./.env -f infra/docker-compose.yml -f infra/docker-compose.override.yml up --build -ddocker compose --env-file ./.env -f infra/docker-compose.yml up --build -dNicheExplorer is a microservices-based application that leverages machine learning and natural language processing to identify emerging trends in research domains. The system automatically fetches content from multiple sources, performs semantic analysis, and generates insightful reports.
| Service | Technology Stack | Port | Purpose |
|---|---|---|---|
| client | React + Vite + Nginx | 80 | User interface and interaction [docs] |
| api-server | Spring Boot + Java | 8080 | Request orchestration and business logic [docs] |
| genai | FastAPI + Python | 8000 | AI/ML processing and embeddings [docs] |
| topic-discovery | FastAPI + Python | 8100 | Content clustering and topic analysis [docs] |
| article-fetcher | FastAPI + Python | 8200 | Content retrieval and RSS processing [docs] |
| db | PostgreSQL + pgvector | 5432 | Data persistence and vector search |
For a up-to-date list of user stories and their implementation status, see the Product Backlog.
The complete REST API specification is available in api/openapi.yaml and published automatically via GitHub Pages:
| Format | Live link |
|---|---|
| ReDoc | https://AET-DevOps25.github.io/team-dev_ops/api.html |
| Swagger UI | https://AET-DevOps25.github.io/team-dev_ops/swagger/ |
The docs GitHub Pages site is built by the workflow in .github/workflows/docs.yml every time api/openapi.yaml changes (or when you trigger the workflow manually).
# Install once
npm i -g redoc-cli swagger-ui-dist http-server
# Generate docs
redoc-cli bundle api/openapi.yaml -o docs/api.html
# Serve locally at http://localhost:8088
http-server docs -p 8088The following diagram illustrates the system architecture and the distribution of responsibilities among team members.

team-dev_ops/
├── api/ # OpenAPI specification and generators
├── services/ # Microservices source code
│ ├── py-genai/ # AI/ML service
│ ├── py-topics/ # Topic discovery service
│ ├── py-fetcher/ # Content fetching service
│ └── spring-api/ # Main API orchestrator
├── web-client/ # React frontend application
├── infra/ # Infrastructure as code
│ ├── docker-compose.yml # Local development
│ ├── helm/ # Kubernetes charts
│ ├── grafana/ # Grafana configuration
│ ├── prometheus/ # Prometheus configuration
│ ├── traefik/ # Traefik configuration
│ ├── terraform/ # Cloud infrastructure
│ └── ansible/ # Configuration management
└── docs/ # Documentation and architecture
The application is set up to be deployable to a Kubernetes cluster via Helm.
The Helm folder structure is organized as follows:
team-dev_ops/infra/helm/
├── monitoring-stack/ # Monitoring deployment
│ ├── grafana/ # All Grafana deployment files
│ ├── prometheus/ # All Prometheus deployment files
│ ├── Chart.yml # Default Helm chart for deployment
├── niche-explorer/ # Microservices source code
│ ├── templates/ # Contains all Kubernetes resource templates
│ │ ├── deployments/ # Helm templates for Deployments
│ │ ├── services/ # Helm templates for Services
│ │ └── ... # Other resource templates (e.g., ingresses, configmaps)
│ ├── Chart.yaml # Helm chart definition for niche-explorer
│ └── values.yaml # Configurable values for niche-explorer charts
├── deploy-monitoring-stack.sh # Script to deploy the monitoring stack
Ensure the following are installed and configured:
- kubectl: To connected to the target Kubernetes cluster.
- Helm: For efficient Kubernetes deployment.
- [.kube] The kubeconfig for the cluster has to be stored as
~/.kube/config
The following has to be manually configured before running the deployment. If any steps fail, consider contacting the cluster admin, as priviledges might be missing.
Two namespaces are required to isolate the main application from the monitoring and them both from other workloads running in the cluster. Create these either manually via the Kubernetes Cluster UI or by running the following commands:
# Namespace for the main application
kubectl create namespace niche-explorer
# Namespace for Prometheus and Grafana
kubectl create namespace monitoringTo securely provide sensitive configuration values to the application, create a Kubernetes secret in the niche-explorer namespace.
Using kubectl:
kubectl create secret generic my-app-credentials \
--from-literal=CHAIR_API_KEY="<YOUR_ACTUAL_CHAIR_API_KEY>" \
--from-literal=GOOGLE_API_KEY="<YOUR_ACTUAL_GOOGLE_API_KEY>" \
--from-literal=POSTGRES_DB="<YOUR_ACTUAL_POSTGRES_DB_NAME>" \
--from-literal=POSTGRES_PASSWORD="<YOUR_ACTUAL_POSTGRES_PASSWORD>" \
--from-literal=POSTGRES_USER="<YOUR_ACTUAL_POSTGRES_USER>" \
--namespace niche-explorerWhere:
CHAIR_API_KEY– API key for The Chair's LLMGOOGLE_API_KEY– API key for Google AI StudioPOSTGRES_DB– Name of the PostgreSQL database to be usedPOSTGRES_PASSWORD– Password for the PostgreSQL userPOSTGRES_USER– Username for the PostgreSQL database
Note: The database credentials are created and configured here. This is the only place where they need to be defined for the Kubernetes deployment - no further configuration of them is required elsewhere.
Environment-specific parameters for the deployment are managed in the Helm values file located at infra/helm/customization.values.yaml. This file must be configured before deployment.
Update the following key parameters:
| Parameter | Description | Example |
|---|---|---|
base_imagedomain |
The domain of the container registry where the application images are stored. | ghcr.io/deployer-organization |
ingress.host |
The public-facing domain name where the application will be accessible. | niche-explorer.deployer-company.com |
ingress.tls.issuer.acme.email |
The email address for obtaining a TLS certificate from Let's Encrypt (ACME). | deployer-email@example.com |
For CI/CD Deployments: If deploying via a pipeline (e.g., a GitHub Action), the configured customization.values.yaml file must be committed and pushed to the repository. The automation pipeline relies on this file to configure the release.
Deployment To deploy both the application and the monitoring stack in the correct order, run the following command from the project's root directory:
bash infra/helm/deploy.shNote Wait around 5 minutes before accessing the application to ensure all services are up and running.#
Uninstal To uninstall / undeploy these two, run bash infra/helm/undeploy.sh from the project's root directory.
Prerequisite This requires creating a GitHub secret called KUBECONFIG, containing the content of the kuberconfig obtained by the cluster.
The Kubernetes lifecycle is automated using two dedicated GitHub Actions:
Deploy to Kubernetes Cluster: Automatically handles the deployment of the application to.Undeploy Kubernetes Cluster Deployment: Manages the complete uninstallation of the application from the cluster. (Given it was deployed using the GitHub action)
Before beginning, ensure the following are installed and configured:
- Terraform
- Ansible Alternative apt Ansible install
- AWS CLI
- An AWS account with permissions to create EC2 instances and related resources.
- A GitHub Personal Access Token (PAT) with
read:packagesscope to pull images from the GitHub Container Registry (ghcr.io). - If Docker images are not pushed to the standard project ghcr.io, update the paths in
infra/compose.aws.yml.
Manually create EC2 Instance:
- Log in to the AWS Management Console and navigate to the EC2 service.
- Click Launch instances.
- Configure the new instance with the following settings:
- Name: Choose a descriptive name (e.g.,
my-app-server). - Application and OS Images (AMI): Select Ubuntu.
- Instance type Use an instance with at least 8 GiB RAM, like
t3.large - Key pair (login): Choose
vockeyas key pair. The corresponding.pemfile will be needed later. - Network settings: Ensure HTTPS and HTTP are enabled.
- Configure storage: Increase the root volume size to at least 25 GiB.
- Name: Choose a descriptive name (e.g.,
- Click Launch instance.
- Once the instance is running, copy its Public IPv4 address and Instance ID, as they will be needed later.
-
Configure the local AWS credentials so Terraform can authenticate. They can typically be found in the AWS account details for the AWS CLI. Copy them into the local credentials file:
# Location of the AWS credentials file ~/.aws/credentials
-
Navigate to the Terraform directory (from project root folder):
cd infra/terraform/ -
Create or update the
terraform.tfvarsfile with the details from the created EC2 instance:# infra/terraform/terraform.tfvars instance_id = "i-xxx" # <-- Replace with your Instance ID public_ip = "xxx.xxx.xxx.xxx" # <-- Replace with your Public IPv4 address
-
Initialize, plan, and apply the Terraform configuration:
# Initialize the Terraform workspace terraform init # (Optional) Preview the changes that will be made terraform plan # Apply the changes to create the Elastic IP terraform apply -auto-approve
This installs Docker, logs in to the container registry, and starts the application.
-
Update the
DOMAINvariable in .env to the obtained public IP and append.nip.io. This allows to use tls, but also to route to prometheus and grafana -
Place the private key file (
labsuser.pem, obtained earlier) into~/.ssh/.1.1. If the directory does not exist, create it first
mkdir ~/.ssh chmod 700 ~/.ssh
-
Set the correct permissions for the private key file.
chmod 400 ~/.ssh/labsuser.pem -
Navigate to the Ansible directory:
cd infra/ansible/ # From project root folder cd ../ansible # From the terraform folder
-
Open the
inventory.ymlfile and update theansible_hostwith the instance's public IP address.# infra/ansible/inventory.yml all: hosts: aws_instance: ansible_host: xxx.xxx.xxx.xxx # <-- Replace with Public IPv4 address # ...
-
Run the Ansible playbook to deploy the application, providing the GitHub username and Personal Access Token as extra variables (May take 5-10 minutes).
ansible-playbook playbook.yml --extra-vars "ghcr_username=<GITHUB_USERNAME> ghcr_token=<GITHUB_TOKEN>"
In rare cases, the AWS public IP address changes seemingly at random. Hence, if the above fails with an SSH error, consider updating the IP address.
The repository is equipped with a Deploy to AWS EC2 GitHub Actions workflow (deploy_to_aws.yml) . For this, specific credentials must be securely stored as GitHub Secrets in the repository (Settings > Secrets and variables > Actions).
The following secrets are required:
| Secret Name | Description |
|---|---|
AWS_ACCESS_KEY_ID |
The access key ID for the AWS account - Contained in the AWS CLI credentials. |
AWS_SECRET_ACCESS_KEY |
The secret access key corresponding to the access key ID - Contained in the AWS CLI credentials. |
AWS_SESSION_TOKEN |
The session token for the AWS credentials - Contained in the AWS CLI credentials. |
AWS_SSH_PRIVATE_KEY |
The full content of the .pem private key file (e.g., labsuser.pem) . |
The workflow is designed to be triggered manually from the Actions tab in the GitHub repository.
When initiated, it will prompt for the Public IPv4 address and the Instance ID of the target EC2 instance. After these details are provided, the action will automatically connect to the instance and deploy the application, mirroring the steps of the manual Ansible deployment.
The monitoring infrastructure leverages Prometheus for robust data collection and alerting, and Grafana for intuitive data visualization.
Prometheus serves as the primary time-series monitoring system. It is responsible for scraping metrics from services, processing them with recording rules, and evaluating alerting rules to identify and notify of issues.
The Prometheus UI can be accessed at prometheus.${DOMAIN}. No authentication is required for basic viewing of metrics and alert states.
Prometheus collects the following metrics, which are defined via infra/prometheus/metrics.rules.yml
| Metric Name | Description | Recording Interval |
|---|---|---|
| job:http_requests_total | Total count of filtered HTTP API requests per job | 1m |
| job:http_requests_total:classify_and_embeddings | Total count of GenAI Classify and Embeddings API requests (last 1 day) | 1m |
| job:http_requests_total:classify_and_embeddings:1h | Total count of GenAI Classify and Embeddings API requests (last 1 hour) | 1m |
| job:http_requests_total:classify_and_embeddings:1m | Total count of GenAI Classify and Embeddings API requests (last 1 minute) | 1m |
| job:http_requests_total:article_fetcher_articles | Total count of Article Fetcher API requests to /api/v1/articles (last 1 day) |
1m |
| job:http_requests_total:article_fetcher_articles:1h | Total count of Article Fetcher API requests to /api/v1/articles (last 1 hour) |
1m |
| job:http_requests_total:article_fetcher_articles:1m | Total count of Article Fetcher API requests to /api/v1/articles (last 1 minute) |
1m |
| job:arxiv_fetch_total:1d | Total count of internal ArXiv fetcher API calls (last 1 day) | 1m |
| job:arxiv_fetch_total:1h | Total count of internal ArXiv fetcher API calls (last 1 hour) | 1m |
| job:arxiv_fetch_total:1m | Total count of internal ArXiv fetcher API calls (last 1 minute) | 1m |
| job:reddit_fetch_total:1d | Total count of internal Reddit fetcher API calls (last 1 day) | 1m |
| job:reddit_fetch_total:1h | Total count of internal Reddit fetcher API calls (last 1 hour) | 1m |
| job:reddit_fetch_total:1m | Total count of internal Reddit fetcher API calls (last 1 minute) | 1m |
| job:http_requests_total:rate5m | 5-minute average rate of filtered HTTP API requests per job | 1m |
| job:http_requests_error:rate5m | 5-minute HTTP error (4xx/5xx) rate (ratio) of filtered API requests per job | 1m |
| job:http_requests_errors | Total count of HTTP errors (4xx/5xx) from filtered API requests per job | 1m |
| job:http_request_duration_seconds:avg1m | 1-minute average HTTP request duration of filtered API calls per job | 1m |
Prometheus continuously evaluates a set of alerting rules, defined in infra/prometheus/alert.rules.yml. When the conditions for an alert are met, Prometheus will fire the alert.
| Alert Name | Service | Severity | Summary of Trigger Condition |
|---|---|---|---|
| ServiceDown | General | critical |
Service up metric is 0 for 2 minute (service is down). |
| HighRequestRateGenAI_Classify | genai |
warning |
Request rate to /api/v1/classify endpoint > 4 RPM for 1 minute. |
| HighTotalRequestsGenAI_Classify | genai |
warning |
Total requests to /api/v1/classify endpoint > 30 in 1 hour for 1 minute. |
| HighDailyRequestsGenAI_Classify | genai |
critical |
Total requests to /api/v1/classify endpoint > 90 in 1 day for 1 minute. |
| HighRequestRateGenAI_Embedding | genai |
warning |
Request rate to /api/v1/embeddings endpoint > 4 RPM for 1 minute. |
| HighTotalRequestsGenAI_Embedding | genai |
warning |
Total requests to /api/v1/embeddings endpoint > 30 in 1 hour for 1 minute. |
| HighDailyRequestsGenAI_Embedding | genai |
critical |
Total requests to /api/v1/embeddings endpoint > 90 in 1 day for 1 minute. |
| HighRequestRateArticleFetcher_ArxivInternal | article-fetcher |
warning |
Internal call rate to ArXiv service > 16 RPM for 1 minute. |
| HighTotalRequestsArticleFetcher_ArxivInternal | article-fetcher |
warning |
Total internal calls to ArXiv service > 500 in 1 hour for 1 minute. |
| HighDailyRequestsArticleFetcher_ArxivInternal | article-fetcher |
critical |
Total internal calls to ArXiv service > 2000 in 1 day for 1 minute. |
| HighRequestRateArticleFetcher_RedditInternal | article-fetcher |
warning |
Internal call rate to Reddit service > 16 RPM for 1 minute. |
| HighTotalRequestsArticleFetcher_RedditInternal | article-fetcher |
warning |
Total internal calls to Reddit service > 500 in 1 hour for 1 minute. |
| HighDailyRequestsArticleFetcher_RedditInternal | article-fetcher |
critical |
Total internal calls to Reddit service > 2000 in 1 day for 1 minute. |
Grafana visualizes the metrics and alerts of Prometheus and can be reached via grafana.${DOMAIN}. To login use the default credentials admin:admin and either chose to setup own credentials or skip the request to do so.
Grafana is pre-configured with the following dashboards to provide immediate insights into system:
Niche-Dashboard: This dashboard visualizes all the metrics defined in prometheus, offering a comprehensive overview of system performance.Finer Niche API Calls: This dashboard provides detailed insights into specific API call patterns for thegenaiandarticle-fetcherservices.System-Alerts: This dashboard is dedicated to displaying the current status of all alerts, making it easy to identify and track any firing alerts.
If any dashboard appears to show "No Data", it could either mean that no relevant data has been collected yet, or the selected time range is not appropriate. Try adjusting the time period in the dashboard settings.
The application uses PostgreSQL with pgvector. Class Diagram:


