Skip to content

feat: Add connection resilience features for better reliability#135

Open
steiner385 wants to merge 4 commits intojenkinsci:mainfrom
steiner385:feature/connection-resilience
Open

feat: Add connection resilience features for better reliability#135
steiner385 wants to merge 4 commits intojenkinsci:mainfrom
steiner385:feature/connection-resilience

Conversation

@steiner385
Copy link

Summary

This PR adds several features to improve MCP connection stability and help clients detect and recover from connection issues:

  • Enable keep-alive by default (30s) - Helps detect broken connections faster
  • Health endpoint (/mcp-server/health) - Lightweight endpoint for connection monitoring (no auth required)
  • Metrics endpoint (/mcp-server/health/metrics) - Connection statistics for debugging (auth required)
  • Graceful shutdown notification - 5-second grace period allows clients to detect shutdown
  • Enhanced connection logging - Logs client IP, X-Forwarded-For, and User-Agent

Health Endpoint Response

{
  "status": "ok",
  "timestamp": "2025-01-28T10:30:00Z",
  "jenkinsVersion": "2.533",
  "shuttingDown": false
}

Metrics Endpoint Response

{
  "sseConnectionsTotal": 42,
  "sseConnectionsActive": 3,
  "streamableRequestsTotal": 150,
  "connectionErrorsTotal": 2,
  "uptimeSeconds": 3600,
  "startTime": "2025-01-28T10:00:00Z"
}

Related Issues

Test plan

  • Unit tests for HealthEndpoint (normal state, shutdown state, no-auth access)
  • Unit tests for McpConnectionMetrics (metric tracking, auth requirement)
  • All existing tests pass (135 tests)
  • Manual verification of health endpoint during Jenkins restart
  • Manual verification of keep-alive messages (check logs)

Notes

  • Client-side work still needed: These server-side changes help clients know WHEN to reconnect, but actual reconnection logic must be implemented in MCP clients
  • Backward compatible: Enabling keep-alive by default may affect clients that don't expect it, but this is documented
  • Streamable HTTP recommended: Per Please enable streamable-http protocol as connection with sse is breaking too much. #15, Streamable HTTP (/mcp-server/mcp) is more reliable than SSE

🤖 Generated with Claude Code

@steiner385 steiner385 requested a review from a team as a code owner January 28, 2026 11:41
@Restricted(NoExternalUse.class)
@Extension
@Slf4j
public class HealthEndpoint implements UnprotectedRootAction {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit sceptic of such endpoint in the mcp plugin. This is more a generic Jenkins endpoint.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks for the feedback. I've updated the health endpoint to be MCP-specific:

Changes:

  • Renamed status → mcpServerStatus
  • Added activeConnections (current MCP connection count)
  • Removed jenkinsVersion (generic Jenkins info)

The endpoint now returns MCP server status and connection metrics rather than generic Jenkins health information:

{
"mcpServerStatus": "ok",
"activeConnections": 5,
"shuttingDown": false,
"timestamp": "..."
}

This makes it clearly specific to the MCP plugin — useful for MCP clients to check server availability and capacity before establishing connections.

* @param rsp the Stapler response
* @throws IOException if writing the response fails
*/
public void doIndex(StaplerRequest2 req, StaplerResponse2 rsp) throws IOException {

Check warning

Code scanning / Jenkins Security Scan

Stapler: Missing permission check Warning

Potential missing permission check in HealthEndpoint#doIndex
* @param rsp the Stapler response
* @throws IOException if writing the response fails
*/
public void doIndex(StaplerRequest2 req, StaplerResponse2 rsp) throws IOException {

Check warning

Code scanning / Jenkins Security Scan

Stapler: Missing POST/RequirePOST annotation Warning

Potential CSRF vulnerability: If HealthEndpoint#doIndex connects to user-specified URLs, modifies state, or is expensive to run, it should be annotated with @POST or @RequirePOST
steiner385 and others added 4 commits February 6, 2026 08:05
This commit adds several features to improve MCP connection stability:

- Enable keep-alive by default (30s interval) to detect broken connections faster
- Add lightweight health endpoint at /mcp-server/health (no auth required)
  - Returns HTTP 200 when healthy, HTTP 503 during shutdown
  - Includes Retry-After header during shutdown for client reconnection
- Add metrics endpoint at /mcp-server/health/metrics (auth required)
  - Tracks SSE connections (total/active), Streamable requests, errors
- Add graceful shutdown notification with 5-second grace period
  - Allows clients to detect shutdown state before connections are closed
- Add enhanced connection logging with client identification
  - Logs IP, X-Forwarded-For, and User-Agent for debugging
- Document all resilience features in README

Related to jenkinsci#15 (SSE connection breaking)
Related to jenkinsci#22 (Gateway timeout issues)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change health endpoint to use UnprotectedRootAction at /mcp-health
  for proper unauthenticated access
- Move metrics endpoint to /mcp-server/metrics (simpler path)
- Handle metrics in process() method since handle() isn't reached
- Update README with correct endpoint paths

The original /mcp-server/health path didn't work because Jenkins's
HttpServletFilter.process() runs AFTER security filtering, not before.
UnprotectedRootAction is the correct way to expose an unauthenticated
endpoint in Jenkins.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add blank lines after constant declarations in Endpoint.java
- Fix import order in McpConnectionMetrics.java (alphabetical)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address reviewer feedback from olamy about health endpoint being too generic.

Changes:
- Rename response field 'status' to 'mcpServerStatus' for clarity
- Add 'activeConnections' field showing current MCP connection count
- Remove 'jenkinsVersion' field (generic Jenkins info not MCP-specific)
- Update Javadoc to emphasize MCP-specific purpose
- Update README documentation with new response format

The health endpoint now returns MCP server status and connection metrics,
making it clearly specific to the MCP plugin rather than a generic
Jenkins health check.

Response format:
{
  "mcpServerStatus": "ok",
  "activeConnections": 5,
  "shuttingDown": false,
  "timestamp": "..."
}

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@steiner385 steiner385 force-pushed the feature/connection-resilience branch from 043d4fe to c58d9fc Compare February 6, 2026 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants