This guide covers monitoring Source Code Portal using Spring Boot Actuator, Prometheus, and Grafana.
- Spring Boot Actuator
- Custom Health Indicators
- Prometheus Integration
- Grafana Dashboards
- Log Aggregation
- Alerting
- Metrics Reference
Spring Boot Actuator provides production-ready features for monitoring and managing the application.
Actuator is enabled by default in Spring Boot mode. Configure in application.yml:
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus,caches,scheduledtasks
base-path: /actuator
endpoint:
health:
show-details: when-authorized
probes:
enabled: true
metrics:
export:
prometheus:
enabled: trueOverall Health: /actuator/health
Returns overall application health status.
curl http://localhost:9090/actuator/healthResponse:
{
"status": "UP",
"components": {
"github": {
"status": "UP",
"details": {
"rateLimit": {
"limit": 5000,
"remaining": 4523,
"reset": "2026-01-28T15:30:00Z"
}
}
},
"cache": {
"status": "UP",
"details": {
"caches": {
"repositories": {
"size": 42,
"hitRate": 0.87
}
}
}
},
"executor": {
"status": "UP",
"details": {
"activeThreads": 5,
"queueSize": 0
}
}
}
}GitHub Health: /actuator/health/github
Checks GitHub API connectivity and rate limits.
curl http://localhost:9090/actuator/health/githubResponse:
{
"status": "UP",
"details": {
"rateLimit": {
"limit": 5000,
"remaining": 4523,
"reset": "2026-01-28T15:30:00Z",
"percentageRemaining": 90.46
}
}
}Status levels:
UP: Healthy, rate limit > 10%DEGRADED: Warning, rate limit between 5-10%DOWN: Critical, rate limit < 5% or API unreachable
Cache Health: /actuator/health/cache
Monitors cache health and statistics.
curl http://localhost:9090/actuator/health/cacheResponse:
{
"status": "UP",
"details": {
"caches": {
"repositories": {
"size": 42,
"hitRate": 0.87,
"missRate": 0.13,
"evictionCount": 0
},
"commits": {
"size": 156,
"hitRate": 0.92,
"missRate": 0.08,
"evictionCount": 2
}
}
}
}Executor Health: /actuator/health/executor
Monitors thread pool health.
curl http://localhost:9090/actuator/health/executorResponse:
{
"status": "UP",
"details": {
"activeThreads": 5,
"poolSize": 10,
"queueSize": 0,
"completedTaskCount": 1523
}
}Liveness Probe: /actuator/health/liveness
Kubernetes liveness probe endpoint. Returns UP if application is running.
curl http://localhost:9090/actuator/health/livenessReadiness Probe: /actuator/health/readiness
Kubernetes readiness probe endpoint. Returns UP if application can accept traffic.
curl http://localhost:9090/actuator/health/readinessApplication Info: /actuator/info
Returns application metadata.
curl http://localhost:9090/actuator/infoResponse:
{
"app": {
"name": "Source Code Portal",
"version": "0.10.17-SNAPSHOT",
"description": "GitHub organization dashboard"
},
"build": {
"artifact": "source-code-portal",
"group": "no.cantara.docsite",
"version": "0.10.17-SNAPSHOT"
},
"runtime": {
"javaVersion": "21.0.1",
"javaVendor": "Eclipse Adoptium",
"osName": "Linux",
"osVersion": "6.17.0"
},
"configuration": {
"githubOrganization": "Cantara",
"repositoryCount": 42,
"groupCount": 5,
"cacheEnabled": true
}
}All Metrics: /actuator/metrics
Lists all available metrics.
curl http://localhost:9090/actuator/metricsResponse:
{
"names": [
"jvm.memory.used",
"jvm.threads.live",
"cache.size",
"cache.gets",
"http.server.requests",
"system.cpu.usage"
]
}Specific Metric: /actuator/metrics/{metricName}
curl http://localhost:9090/actuator/metrics/jvm.memory.usedResponse:
{
"name": "jvm.memory.used",
"measurements": [
{
"statistic": "VALUE",
"value": 536870912
}
],
"availableTags": [
{
"tag": "area",
"values": ["heap", "nonheap"]
}
]
}Cache Details: /actuator/caches
Shows cache manager details.
curl http://localhost:9090/actuator/cachesResponse:
{
"cacheManagers": {
"cacheManager": {
"caches": {
"repositories": {
"target": "com.github.benmanes.caffeine.cache.BoundedLocalCache"
},
"commits": {
"target": "com.github.benmanes.caffeine.cache.BoundedLocalCache"
}
}
}
}
}Scheduled Tasks: /actuator/scheduledtasks
Lists all scheduled tasks.
curl http://localhost:9090/actuator/scheduledtasksResponse:
{
"cron": [],
"fixedDelay": [],
"fixedRate": [
{
"runnable": {
"target": "no.cantara.docsite.fetch.ScheduledFetchData.fetchRepositories"
},
"initialDelay": 60000,
"interval": 300000
}
]
}Source Code Portal includes three custom health indicators:
Monitors GitHub API connectivity and rate limits.
Location: no.cantara.docsite.actuator.GitHubHealthIndicator
Checks:
- GitHub API reachability
- Rate limit remaining
- Rate limit reset time
Status Logic:
if (remaining < limit * 0.05) {
return Health.down()
.withDetail("message", "GitHub rate limit critically low")
.build();
} else if (remaining < limit * 0.10) {
return Health.status("DEGRADED")
.withDetail("message", "GitHub rate limit low")
.build();
} else {
return Health.up().build();
}Monitors cache performance and health.
Location: no.cantara.docsite.actuator.CacheHealthIndicator
Checks:
- Cache size
- Hit rate
- Miss rate
- Eviction count
Status Logic:
if (hitRate < 0.50) {
return Health.status("DEGRADED")
.withDetail("message", "Cache hit rate below 50%")
.build();
} else {
return Health.up().build();
}Monitors thread pool health.
Location: no.cantara.docsite.actuator.ExecutorHealthIndicator
Checks:
- Active thread count
- Pool size
- Queue size
- Completed task count
Status Logic:
if (queueSize > 100) {
return Health.down()
.withDetail("message", "Executor queue size too high")
.build();
} else if (activeThreads >= poolSize) {
return Health.status("DEGRADED")
.withDetail("message", "All threads busy")
.build();
} else {
return Health.up().build();
}Prometheus metrics are automatically exposed at /actuator/prometheus.
curl http://localhost:9090/actuator/prometheusOutput (Prometheus format):
# HELP jvm_memory_used_bytes The amount of used memory
# TYPE jvm_memory_used_bytes gauge
jvm_memory_used_bytes{area="heap",id="G1 Eden Space",} 5.36870912E8
# HELP cache_size The number of entries in the cache
# TYPE cache_size gauge
cache_size{cache="repositories",cacheManager="cacheManager",} 42.0
# HELP http_server_requests_seconds
# TYPE http_server_requests_seconds summary
http_server_requests_seconds_count{method="GET",outcome="SUCCESS",status="200",uri="/dashboard",} 1523.0
Create prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'sourcecodeportal'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['localhost:9090']
labels:
application: 'source-code-portal'
environment: 'production'Run Prometheus:
docker run -d \
--name prometheus \
-p 9091:9090 \
-v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheusAccess Prometheus UI: http://localhost:9091
health_status{component="github"}- GitHub health (1=UP, 0=DOWN)health_status{component="cache"}- Cache healthhealth_status{component="executor"}- Executor health
http_server_requests_seconds_count- Request counthttp_server_requests_seconds_sum- Total request timecache_gets_total{result="hit"}- Cache hitscache_gets_total{result="miss"}- Cache misses
jvm_memory_used_bytes- JVM memory usagejvm_threads_live- Active threadssystem_cpu_usage- CPU usageprocess_cpu_usage- Process CPU usage
github_rate_limit_remaining- Remaining API callsgithub_rate_limit_limit- Total API calls allowedgithub_api_calls_total- Total API calls made
docker run -d \
--name grafana \
-p 3000:3000 \
grafana/grafanaAccess Grafana: http://localhost:3000 (admin/admin)
- Navigate to Configuration → Data Sources
- Click Add data source
- Select Prometheus
- Set URL:
http://prometheus:9091 - Click Save & Test
Create sourcecodeportal-dashboard.json:
{
"dashboard": {
"title": "Source Code Portal",
"panels": [
{
"title": "Request Rate",
"targets": [
{
"expr": "rate(http_server_requests_seconds_count[5m])"
}
],
"type": "graph"
},
{
"title": "Response Time (95th percentile)",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(http_server_requests_seconds_bucket[5m]))"
}
],
"type": "graph"
},
{
"title": "Cache Hit Rate",
"targets": [
{
"expr": "rate(cache_gets_total{result=\"hit\"}[5m]) / rate(cache_gets_total[5m])"
}
],
"type": "graph"
},
{
"title": "GitHub Rate Limit",
"targets": [
{
"expr": "github_rate_limit_remaining"
}
],
"type": "graph"
},
{
"title": "JVM Memory Usage",
"targets": [
{
"expr": "jvm_memory_used_bytes{area=\"heap\"}"
}
],
"type": "graph"
}
]
}
}Import dashboard:
- Navigate to Dashboards → Import
- Upload
sourcecodeportal-dashboard.json - Select Prometheus data source
- Click Import
- Request Rate:
rate(http_server_requests_seconds_count[5m]) - Response Time P95:
histogram_quantile(0.95, rate(http_server_requests_seconds_bucket[5m])) - Cache Hit Rate:
rate(cache_gets_total{result="hit"}[5m]) / rate(cache_gets_total[5m]) - GitHub Rate Limit:
github_rate_limit_remaining - JVM Memory:
jvm_memory_used_bytes{area="heap"} - Active Threads:
jvm_threads_live - CPU Usage:
system_cpu_usage - Error Rate:
rate(http_server_requests_seconds_count{status=~"5.."}[5m])
Configure in logback-spring.xml:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<include resource="org/springframework/boot/logging/logback/defaults.xml"/>
<!-- Console appender -->
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<!-- File appender with rolling -->
<appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>logs/application.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>logs/application.%d{yyyy-MM-dd}.%i.log.gz</fileNamePattern>
<timeBasedFileNamingAndTriggeringPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP">
<maxFileSize>10MB</maxFileSize>
</timeBasedFileNamingAndTriggeringPolicy>
<maxHistory>7</maxHistory>
</rollingPolicy>
<encoder>
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<!-- JSON appender for log aggregation -->
<appender name="JSON" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="net.logstash.logback.encoder.LogstashEncoder"/>
</appender>
<root level="INFO">
<appender-ref ref="CONSOLE"/>
<appender-ref ref="FILE"/>
</root>
<logger name="no.cantara" level="DEBUG"/>
<logger name="org.springframework" level="WARN"/>
</configuration>Create filebeat.yml:
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/sourcecodeportal/application.log
fields:
application: sourcecodeportal
environment: production
output.elasticsearch:
hosts: ["elasticsearch:9200"]
index: "sourcecodeportal-%{+yyyy.MM.dd}"
setup.kibana:
host: "kibana:5601"version: '3.8'
services:
sourcecodeportal:
image: cantara/sourcecodeportal
volumes:
- scp-logs:/home/sourcecodeportal/logs
filebeat:
image: docker.elastic.co/beats/filebeat:8.11.0
volumes:
- scp-logs:/var/log/sourcecodeportal:ro
- ./filebeat.yml:/usr/share/filebeat/filebeat.yml:ro
depends_on:
- elasticsearch
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
environment:
- discovery.type=single-node
ports:
- "9200:9200"
kibana:
image: docker.elastic.co/kibana/kibana:8.11.0
ports:
- "5601:5601"
depends_on:
- elasticsearch
volumes:
scp-logs:Create alert.rules.yml:
groups:
- name: sourcecodeportal
interval: 30s
rules:
# High error rate
- alert: HighErrorRate
expr: rate(http_server_requests_seconds_count{status=~"5.."}[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value }} requests/sec"
# GitHub rate limit low
- alert: GitHubRateLimitLow
expr: github_rate_limit_remaining < 500
for: 5m
labels:
severity: warning
annotations:
summary: "GitHub rate limit low"
description: "Only {{ $value }} API calls remaining"
# Cache hit rate low
- alert: CacheHitRateLow
expr: rate(cache_gets_total{result="hit"}[5m]) / rate(cache_gets_total[5m]) < 0.5
for: 10m
labels:
severity: warning
annotations:
summary: "Cache hit rate below 50%"
# High memory usage
- alert: HighMemoryUsage
expr: jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"} > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage"
description: "Memory usage is {{ $value | humanizePercentage }}"
# Application down
- alert: ApplicationDown
expr: up{job="sourcecodeportal"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Application is down"Create alertmanager.yml:
global:
slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'slack-notifications'
receivers:
- name: 'slack-notifications'
slack_configs:
- channel: '#alerts'
text: '{{ range .Alerts }}{{ .Annotations.summary }}\n{{ .Annotations.description }}\n{{ end }}'jvm.memory.used- Memory usage (heap/nonheap)jvm.memory.max- Maximum memoryjvm.threads.live- Live threadsjvm.threads.daemon- Daemon threadsjvm.gc.pause- Garbage collection pause time
http.server.requests- Request count, durationhttp.server.requests.active- Active requests
cache.size- Cache sizecache.gets- Cache gets (hit/miss)cache.puts- Cache putscache.evictions- Cache evictions
system.cpu.usage- System CPU usageprocess.cpu.usage- Process CPU usagesystem.load.average.1m- System load average
github.rate.limit.remaining- GitHub API rate limitgithub.api.calls.total- Total GitHub API callsrepository.count- Number of repositories monitored
- Monitor health endpoints regularly (every 30 seconds)
- Set up alerts for critical metrics (error rate, rate limit)
- Use Grafana dashboards for visualization
- Aggregate logs centrally (ELK stack)
- Monitor GitHub rate limit closely
- Track cache performance to optimize TTL
- Set up on-call rotation for alerts
- Review metrics trends weekly
- Test alerts to ensure they fire correctly
- Document runbooks for common issues
- Troubleshooting Guide - Resolve common issues
- Deployment Guide - Production deployment
- Docker Guide - Container deployment