Skip to content

Latest commit

 

History

History
955 lines (686 loc) · 18.2 KB

File metadata and controls

955 lines (686 loc) · 18.2 KB

Troubleshooting Guide

This guide helps diagnose and resolve common issues with Source Code Portal.

Table of Contents

Build Issues

Maven Build Fails with "Cannot find symbol"

Symptom:

[ERROR] cannot find symbol: method findElementById(String)

Cause: API changes in dependencies (e.g., Selenium 4 removed convenience methods)

Solution:

// Wrong (Selenium 3)
driver.findElementById("login_field");

// Correct (Selenium 4)
driver.findElement(By.id("login_field"));

Reference: LEARNINGS.md - Selenium 4 API Changes


Maven Build Fails with "Could not find artifact"

Symptom:

[ERROR] Could not find artifact com.atlassian.commonmark:commonmark:jar:0.22.0

Cause: Dependency moved to different group ID

Solution:

Check if the library changed group ID. Example for Commonmark:

<!-- Wrong -->
<groupId>com.atlassian.commonmark</groupId>

<!-- Correct -->
<groupId>org.commonmark</groupId>

Reference: LEARNINGS.md - Commonmark Group ID Change


Tests Fail After Migration to JUnit 5

Symptom: Assertions pass when they should fail, or vice versa

Cause: TestNG and JUnit 5 have reversed parameter order

Solution:

// TestNG (actual, expected)
Assert.assertEquals(actualValue, expectedValue);

// JUnit 5 (expected, actual) - REVERSED!
Assertions.assertEquals(expectedValue, actualValue);

Critical: This is a logic error, not a compilation error. Review all assertions carefully.

Reference: LEARNINGS.md - Assertion Parameter Order


Maven Package Fails with NoSuchMethodError

Symptom:

[ERROR] NoSuchMethodError: 'org.codehaus.plexus.archiver.util.DefaultFileSet...'

Cause: Maven plugin version incompatibility

Workaround:

# Don't use this (fails)
mvn package

# Use Spring Boot plugin instead (works)
mvn spring-boot:run
# or
mvn clean install -DskipTests

Reference: LEARNINGS.md - Maven Plugin Compatibility


Configuration Issues

Application Fails to Start: "GitHub access token not configured"

Symptom:

ERROR: GitHub access token not configured. Set SCP_GITHUB_ACCESS_TOKEN environment variable.

Cause: Missing GitHub authentication

Solution:

Option 1: Environment variable (recommended)

export SCP_GITHUB_ACCESS_TOKEN=ghp_your_token
java -jar source-code-portal.jar

Option 2: Security properties file

# security.properties
github.client.accessToken=ghp_your_token

Option 3: Generate token

# Using Docker
docker run -it \
  -e SCP_github.oauth2.client.clientId=CLIENT_ID \
  -e SCP_github.oauth2.client.clientSecret=CLIENT_SECRET \
  cantara/sourcecodeportal /github-access-token

Reference: Configuration Guide


Configuration File Not Found

Symptom:

WARN: Configuration file not found: config.json. Using defaults.

Cause: Missing repository configuration file

Solution:

  1. Create config.json:
{
  "githubOrganizationName": "YourOrg",
  "groups": [
    {
      "groupId": "core",
      "display-name": "Core Services",
      "description": "Core application services",
      "repos": ["repo1", "repo2"]
    }
  ]
}
  1. Place in one of these locations:
    • src/main/resources/conf/config.json (built-in)
    • ./config_override/conf/config.json (runtime override)
    • /home/sourcecodeportal/config_override/conf/config.json (Docker)

Reference: Configuration Guide


Environment Variables Not Loading

Symptom: Configuration from environment variables is ignored

Cause: Incorrect prefix or format

Solution:

Use SCP_ prefix and correct format:

# Wrong
export GITHUB_ACCESS_TOKEN=token

# Correct
export SCP_GITHUB_ACCESS_TOKEN=token

# For nested properties, use underscores or dots
export SCP_GITHUB_ORGANIZATION=Cantara
export SCP_github.organization=Cantara  # Both work

Reference: Configuration Guide


GitHub API Issues

GitHub Rate Limit Exceeded

Symptom:

ERROR: GitHub API rate limit exceeded. Reset at: 2026-01-28T15:30:00Z

Cause: Too many API calls, exceeding GitHub's rate limit

GitHub Rate Limits:

  • Unauthenticated: 60 requests/hour
  • Authenticated: 5000 requests/hour
  • GitHub Enterprise: 15000 requests/hour

Immediate Solutions:

  1. Check rate limit status:
curl http://localhost:9090/actuator/health/github
  1. Wait for rate limit reset (shown in error message)

  2. Use authenticated requests (much higher limit):

export SCP_GITHUB_ACCESS_TOKEN=ghp_your_token

Long-term Solutions:

  1. Increase cache TTL to reduce API calls:
cache:
  ttl: 60  # Cache for 60 minutes instead of 30
  1. Reduce scheduled task frequency:
fetch:
  schedule:
    repositories: 600000  # 10 minutes instead of 5
  1. Use webhooks for real-time updates instead of polling:
github:
  webhook:
    enabled: true
    secret: your_webhook_secret

Monitoring:

Set up alerts for rate limit:

# Prometheus alert
- alert: GitHubRateLimitLow
  expr: github_rate_limit_remaining < 500
  for: 5m

Reference: Monitoring Guide


GitHub API Unreachable

Symptom:

ERROR: Failed to connect to GitHub API: Connection refused

Cause: Network connectivity issue or firewall blocking

Diagnosis:

  1. Test GitHub API connectivity:
curl -H "Authorization: token ghp_your_token" https://api.github.com/rate_limit
  1. Check DNS resolution:
nslookup api.github.com
  1. Check firewall rules:
telnet api.github.com 443

Solutions:

  1. Configure proxy if behind firewall:
spring:
  proxy:
    host: proxy.example.com
    port: 8080
  1. Check network connectivity
  2. Verify GitHub API status: https://www.githubstatus.com/

GitHub Authentication Fails

Symptom:

ERROR: GitHub authentication failed: 401 Unauthorized

Cause: Invalid or expired access token

Solutions:

  1. Verify token is valid:
curl -H "Authorization: token ghp_your_token" https://api.github.com/user
  1. Check token has required scopes:

    • repo - Full repository access
    • read:org - Read organization data
    • read:user - Read user profile
  2. Generate new token: https://github.com/settings/tokens

  3. Verify token is not expired (check GitHub settings)


Webhook Issues

Webhook Delivery Fails

Symptom: GitHub shows webhook delivery failed (red X)

Diagnosis:

  1. Check webhook endpoint is accessible:
curl https://your-server.com/github/webhook
  1. View webhook delivery in GitHub:
    • Go to repository → Settings → Webhooks
    • Click on webhook
    • View "Recent Deliveries"

Common Causes:

  1. Server not accessible from internet

    • Solution: Use ngrok for local development
    ngrok http 9090
    # Use ngrok URL: https://xxxxx.ngrok.io/github/webhook
  2. SSL certificate invalid

    • GitHub requires valid SSL certificate
    • Solution: Use Let's Encrypt or disable SSL verification (dev only)
  3. Webhook secret mismatch

    • Solution: Verify secret matches in both places
    export SCP_GITHUB_WEBHOOK_SECRET=your_secret
    # Same secret in GitHub webhook settings

Webhook Authenticated but Cache Not Updated

Symptom: Webhook delivered successfully but dashboard not updating

Diagnosis:

  1. Check application logs:
tail -f logs/application.log | grep "webhook"
  1. Check webhook is calling cache eviction:
// Should see this in logs
INFO: Webhook received: push event for repo xyz
INFO: Evicting cache for repository xyz

Solutions:

  1. Verify webhook controller is processing events:
curl -X POST http://localhost:9090/github/webhook \
  -H "Content-Type: application/json" \
  -H "X-Hub-Signature-256: sha256=..." \
  -d @webhook-payload.json
  1. Manually trigger cache refresh:
# Via actuator
curl -X DELETE http://localhost:9090/actuator/caches/repositories

Performance Issues

Slow Dashboard Load Times

Symptom: Dashboard takes > 5 seconds to load

Diagnosis:

  1. Check health endpoint:
curl http://localhost:9090/actuator/health
  1. Check cache hit rate:
curl http://localhost:9090/actuator/health/cache

Common Causes:

  1. Cache not populated

    • Solution: Wait for initial data fetch (60 seconds after startup)
    • Or manually trigger: Restart application
  2. Low cache hit rate

    • Solution: Increase cache TTL
    cache:
      ttl: 60  # Increase from 30 to 60 minutes
      max-size: 20000  # Increase cache size
  3. Too many repositories

    • Solution: Use pagination or reduce repository count
  4. GitHub API slow

    github:
      timeout: 120  # Increase from 75 to 120 seconds

High CPU Usage

Symptom: CPU usage constantly above 80%

Diagnosis:

  1. Check thread pool:
curl http://localhost:9090/actuator/health/executor
  1. Check metrics:
curl http://localhost:9090/actuator/metrics/system.cpu.usage

Common Causes:

  1. Too many concurrent API calls

    • Solution: Reduce bulkhead limit
    resilience4j:
      bulkhead:
        instances:
          github:
            maxConcurrentCalls: 15  # Reduce from 25
  2. Scheduled tasks running too frequently

    • Solution: Increase intervals
    fetch:
      schedule:
        repositories: 600000  # 10 minutes instead of 5
  3. Inefficient code

    • Use Java profiler (VisualVM, YourKit)
    • Identify hot spots

High Response Times

Symptom: API endpoints taking > 2 seconds

Diagnosis:

  1. Check response times:
curl -w "@curl-format.txt" http://localhost:9090/dashboard

Create curl-format.txt:

time_namelookup:  %{time_namelookup}\n
time_connect:  %{time_connect}\n
time_appconnect:  %{time_appconnect}\n
time_pretransfer:  %{time_pretransfer}\n
time_redirect:  %{time_redirect}\n
time_starttransfer:  %{time_starttransfer}\n
time_total:  %{time_total}\n
  1. Check circuit breaker status:
curl http://localhost:9090/actuator/circuitbreakers

Solutions:

  1. Circuit breaker open

    • Too many failures caused circuit breaker to open
    • Wait for circuit breaker to reset (60 seconds)
    • Fix underlying issue (GitHub API, network)
  2. External service slow

    • Check GitHub, Jenkins, Snyk status
    • Increase timeouts or disable integration
  3. Enable virtual threads (Java 21)

    spring:
      threads:
        virtual:
          enabled: true

Memory Issues

OutOfMemoryError

Symptom:

java.lang.OutOfMemoryError: Java heap space

Immediate Solution:

  1. Restart with more memory:
java -Xmx2g -Xms1g -jar source-code-portal.jar

Diagnosis:

  1. Check memory usage:
curl http://localhost:9090/actuator/metrics/jvm.memory.used
  1. Generate heap dump:
jmap -dump:format=b,file=heap.bin <PID>
  1. Analyze with Eclipse MAT or VisualVM

Common Causes:

  1. Cache too large

    • Solution: Reduce cache size
    cache:
      max-size: 5000  # Reduce from 10000
  2. Memory leak

    • Use heap dump analysis to identify
    • Check for unclosed resources (streams, connections)
  3. Too many threads

    • Solution: Reduce thread pool size
    executor:
      core-pool-size: 5
      max-pool-size: 10

High Memory Usage

Symptom: Memory usage growing over time (not yet OOM)

Diagnosis:

  1. Monitor memory trend:
watch -n 5 'curl -s http://localhost:9090/actuator/metrics/jvm.memory.used | jq'
  1. Check garbage collection:
curl http://localhost:9090/actuator/metrics/jvm.gc.pause

Solutions:

  1. Tune garbage collector:
java -XX:+UseG1GC \
     -XX:MaxGCPauseMillis=200 \
     -XX:+UseStringDeduplication \
     -jar source-code-portal.jar
  1. Enable heap dump on OOM:
java -XX:+HeapDumpOnOutOfMemoryError \
     -XX:HeapDumpPath=/tmp/heap-dump.hprof \
     -jar source-code-portal.jar
  1. Reduce cache TTL to allow more evictions

Cache Issues

Cache Not Working

Symptom: Cache hit rate is 0% or cache always misses

Diagnosis:

  1. Check cache health:
curl http://localhost:9090/actuator/health/cache
  1. Check cache configuration:
curl http://localhost:9090/actuator/caches

Common Causes:

  1. Cache disabled

    • Solution: Enable cache
    spring:
      cache:
        type: caffeine
  2. Cache keys changing

    • Verify cache keys are consistent
    • Check CacheKey, CacheRepositoryKey implementations
  3. TTL too short

    • Solution: Increase TTL
    cache:
      ttl: 60  # Increase from 30

Cache Eviction Too Frequent

Symptom: Cache evictions very high

Diagnosis:

Check eviction count:

curl http://localhost:9090/actuator/metrics/cache.evictions

Causes:

  1. Cache size too small

    • Solution: Increase max size
    cache:
      max-size: 20000
  2. Memory pressure

    • Increase JVM heap size
    • Reduce other memory consumers

Stale Cache Data

Symptom: Dashboard shows outdated information

Solutions:

  1. Manual cache clear:
curl -X DELETE http://localhost:9090/actuator/caches/repositories
  1. Reduce TTL:
cache:
  ttl: 15  # Reduce from 30 to 15 minutes
  1. Enable webhooks for real-time updates

  2. Force refresh via API:

curl -X POST http://localhost:9090/api/refresh

Deployment Issues

Docker Container Won't Start

Diagnosis:

Check logs:

docker logs scp
docker logs -f scp  # Follow logs

Common Causes:

  1. Port already in use:
# Check what's using port 9090
lsof -i :9090

# Use different port
docker run -p 8080:9090 cantara/sourcecodeportal
  1. Missing environment variables:
docker run \
  -e SCP_GITHUB_ACCESS_TOKEN=token \
  cantara/sourcecodeportal
  1. Volume mount permission issues:
# Fix permissions
chmod 644 config.json
docker run -v $(pwd)/config.json:/home/sourcecodeportal/config_override/conf/config.json:ro cantara/sourcecodeportal

Reference: Docker Guide - Troubleshooting


Kubernetes Pod CrashLoopBackOff

Diagnosis:

# Check pod status
kubectl describe pod <pod-name> -n sourcecodeportal

# Check logs
kubectl logs <pod-name> -n sourcecodeportal --previous

Common Causes:

  1. Health check failing too early:
livenessProbe:
  initialDelaySeconds: 120  # Increase from 60
  1. Missing secrets:
# Verify secrets exist
kubectl get secrets -n sourcecodeportal
kubectl describe secret scp-secrets -n sourcecodeportal
  1. Resource limits too low:
resources:
  limits:
    memory: "2Gi"  # Increase from 1Gi

Network Issues

Cannot Reach GitHub API

Diagnosis:

  1. Test from within container:
docker exec scp curl https://api.github.com/rate_limit
  1. Check DNS resolution:
docker exec scp nslookup api.github.com

Solutions:

  1. Configure DNS:
docker run --dns 8.8.8.8 cantara/sourcecodeportal
  1. Configure proxy:
spring:
  http:
    proxy:
      host: proxy.example.com
      port: 8080

Diagnostic Tools

Health Check Script

Create health-check.sh:

#!/bin/bash

BASE_URL="http://localhost:9090"

echo "=== Overall Health ==="
curl -s $BASE_URL/actuator/health | jq

echo -e "\n=== GitHub Health ==="
curl -s $BASE_URL/actuator/health/github | jq

echo -e "\n=== Cache Health ==="
curl -s $BASE_URL/actuator/health/cache | jq

echo -e "\n=== Executor Health ==="
curl -s $BASE_URL/actuator/health/executor | jq

echo -e "\n=== Memory Usage ==="
curl -s $BASE_URL/actuator/metrics/jvm.memory.used | jq

echo -e "\n=== Thread Count ==="
curl -s $BASE_URL/actuator/metrics/jvm.threads.live | jq

Run:

chmod +x health-check.sh
./health-check.sh

Log Analysis

# Find errors
tail -1000 logs/application.log | grep ERROR

# Find GitHub API errors
grep "GitHub API" logs/application.log | grep ERROR

# Find slow queries (> 1 second)
grep "took [0-9]\{4,\}ms" logs/application.log

# Count error types
grep ERROR logs/application.log | cut -d: -f4 | sort | uniq -c | sort -rn

Performance Profiling

# Enable JMX
java -Dcom.sun.management.jmxremote \
     -Dcom.sun.management.jmxremote.port=9010 \
     -Dcom.sun.management.jmxremote.authenticate=false \
     -Dcom.sun.management.jmxremote.ssl=false \
     -jar source-code-portal.jar

# Connect with JConsole
jconsole localhost:9010

# Or use VisualVM
jvisualvm

Getting Help

If you're still stuck:

  1. Check documentation:

  2. Check logs: logs/application.log

  3. Check health endpoints: /actuator/health

  4. Review learnings: LEARNINGS.md

  5. Open an issue: Provide:

    • Error message
    • Logs (relevant sections)
    • Configuration (sanitized)
    • Health endpoint output
    • Steps to reproduce

Next Steps