Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jun 20, 2025

This PR implements a comprehensive monitoring and observability system for NLWebNet, transforming it from an experimental library into a production-ready solution with enterprise-grade monitoring capabilities.

🚀 Features Implemented

Health Checks

  • Basic Health Endpoint (/health) - Simple health status for load balancers
  • Detailed Health Endpoint (/health/detailed) - Comprehensive service-by-service health status
  • Service-Specific Checks:
    • NLWebHealthCheck - Core service availability
    • DataBackendHealthCheck - Data backend connectivity with graceful error handling
    • AIServiceHealthCheck - AI/MCP service operational status
  • Automatic Registration - Health checks auto-configured with AddNLWebNet()

Comprehensive Metrics Collection

Built-in metrics using .NET 9's System.Diagnostics.Metrics:

  • Request Metrics: nlweb.requests.total, nlweb.request.duration, nlweb.requests.errors
  • AI Service Metrics: nlweb.ai.calls.total, nlweb.ai.duration, nlweb.ai.errors
  • Data Backend Metrics: nlweb.data.queries.total, nlweb.data.duration, nlweb.data.errors
  • Health Check Metrics: nlweb.health.checks.total, nlweb.health.failures
  • Business Metrics: nlweb.queries.by_type, nlweb.queries.complexity

All metrics include structured labels (endpoint, method, status_code, query_mode, etc.) for rich querying and dashboards.

Rate Limiting

  • Configurable Limits - Requests per window with customizable time windows
  • Multiple Strategies - IP-based, client ID-based, or custom identifier patterns
  • Standard Headers - X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
  • Proper HTTP Responses - 429 Too Many Requests with Retry-After guidance
  • High Performance - In-memory token bucket algorithm with concurrent safety

Enhanced Structured Logging

  • Correlation ID Tracking - Automatic generation and propagation across requests
  • Scoped Logging - Rich context automatically included in all log entries
  • Request Context - Path, method, IP address, user agent tracking
  • Error Correlation - Exception tracking with full request context
  • Utility Classes - CorrelationIdUtility for consistent logging patterns

🧪 Testing

Added 23 new tests (bringing total to 62 tests, all passing):

  • 16 Health Check Tests - Covering all health check scenarios including error conditions
  • 7 Rate Limiting Tests - Testing limits, different identifiers, and edge cases

🔧 Integration Ready

Middleware Pipeline

app.UseNLWebNet(); // Automatically adds:
// 1. RateLimitingMiddleware
// 2. MetricsMiddleware  
// 3. NLWebMiddleware (with correlation IDs)

Simple Configuration

builder.Services.AddNLWebNet(options =>
{
    options.RateLimiting.RequestsPerWindow = 100;
    options.RateLimiting.WindowSizeInMinutes = 1;
    options.RateLimiting.EnableIPBasedLimiting = true;
});

External Monitoring Systems

  • Prometheus/Grafana - Built-in metrics are export-ready
  • Azure Application Insights - Structured logging integrates seamlessly
  • OpenTelemetry - Correlation IDs support distributed tracing

📊 Example Responses

Health Check:

{
  "status": "Healthy",
  "totalDuration": "00:00:00.0234567",
  "entries": {
    "nlweb": { "status": "Healthy", "description": "NLWeb service is operational" },
    "data-backend": { "status": "Healthy", "description": "Data backend is operational" },
    "ai-service": { "status": "Healthy", "description": "AI/MCP service is operational" }
  }
}

Rate Limited Response:

{
  "error": "rate_limit_exceeded",
  "message": "Rate limit exceeded. Maximum 100 requests per 1 minute(s).",
  "retry_after_seconds": 45
}

🏗️ Architecture

  • Backward Compatible - All existing functionality preserved
  • Minimal API Integration - Health endpoints follow existing patterns
  • Dependency Injection - All services properly registered and configurable
  • Middleware Ordering - Rate limiting → Metrics → Core processing for optimal performance

📖 Documentation

Added comprehensive monitoring documentation (doc/monitoring-demo.md) with:

  • Configuration examples for all scenarios
  • Integration guides for external systems
  • Production deployment recommendations

This implementation provides enterprise-grade monitoring capabilities that make NLWebNet immediately production-ready with minimal configuration required.

Fixes #3.

Warning

Firewall rules blocked me from connecting to one or more addresses

I tried to connect to the following addresses, but was blocked by firewall rules:

  • aka.ms
    • Triggering command: curl -I -sSL --retry 5 --retry-delay 2 --connect-timeout 15 REDACTED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

@jongalloway
Copy link
Collaborator

@copilot Make sure that everything works well with .NET Aspire and Open Telemetry. Those are the top priorities.

Copilot AI changed the title [WIP] 📊 Production-Ready Monitoring and Observability 📊 Add Production-Ready Monitoring and Observability System Jun 20, 2025
Copilot AI requested a review from jongalloway June 20, 2025 22:10
Copilot finished work on behalf of jongalloway June 20, 2025 22:10
@jongalloway
Copy link
Collaborator

@copilot Please fix the build error

Copilot finished work on behalf of jongalloway June 20, 2025 23:25
@jongalloway
Copy link
Collaborator

@copilot I have added copilot-setup-steps.yml to the repo which will install .NET 9. It is not acceptable to roll back to .NET 8. Any changes must work with .NET 9.

Copy link
Contributor Author

Copilot AI commented Jun 20, 2025

@copilot I have added copilot-setup-steps.yml to the repo which will install .NET 9. It is not acceptable to roll back to .NET 8. Any changes must work with .NET 9.

I've added the copilot-setup-steps.yml file to install .NET 9 SDK and the Aspire workload. All projects now build and test successfully with .NET 9, and .NET Aspire and OpenTelemetry integrations are fully compatible. Fixed Aspire sample project compatibility issues for .NET 9. (e2d4e54)

Warning

Firewall rules blocked me from connecting to one or more addresses

I tried to connect to the following addresses, but was blocked by firewall rules:

  • aka.ms
    • Triggering command: curl -I -sSL --retry 5 --retry-delay 2 --connect-timeout 15 REDACTED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot finished work on behalf of jongalloway June 20, 2025 23:36
@jongalloway
Copy link
Collaborator

@copilot There is a merge conflict for copilot-setup-steps.yml. Please figure this out in a way that allows this PR to be validated without removing the existing functionality.

Copy link
Contributor Author

Copilot AI commented Jun 20, 2025

@copilot There is a merge conflict for copilot-setup-steps.yml. Please figure this out in a way that allows this PR to be validated without removing the existing functionality.

Fixed the merge conflict by preserving all existing functionality from the main branch copilot-setup-steps.yml while adding the required Aspire workload installation. The file now includes comprehensive setup steps (install checks, NuGet restore, build verification) plus Aspire support for .NET 9 compatibility. All 73 tests pass with the merged configuration. (28b2b2e)

Warning

Firewall rules blocked me from connecting to one or more addresses

I tried to connect to the following addresses, but was blocked by firewall rules:

  • aka.ms
    • Triggering command: curl -I -sSL --retry 5 --retry-delay 2 --connect-timeout 15 REDACTED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot finished work on behalf of jongalloway June 20, 2025 23:44
@jongalloway
Copy link
Collaborator

@copilot I still see a build error. Please check.

Copy link
Contributor Author

Copilot AI commented Jun 21, 2025

@copilot I still see a build error. Please check.

Build error has been resolved. The issue was that the environment was using .NET 8 SDK instead of .NET 9. I've installed .NET 9 SDK and the Aspire workload, and now the solution builds successfully with all 73 tests passing. The copilot-setup-steps.yml file was correctly configured for .NET 9 installation.

Warning

Firewall rules blocked me from connecting to one or more addresses

I tried to connect to the following addresses, but was blocked by firewall rules:

  • aka.ms
    • Triggering command: curl -I -sSL --retry 5 --retry-delay 2 --connect-timeout 15 REDACTED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot finished work on behalf of jongalloway June 21, 2025 00:07
@jongalloway jongalloway marked this pull request as ready for review June 21, 2025 00:09
@jongalloway jongalloway requested a review from Copilot June 21, 2025 01:27

This comment was marked as outdated.

- Deleted AIServiceHealthCheckTests.cs, DataBackendHealthCheckTests.cs, NLWebHealthCheckTests.cs, and InMemoryRateLimitingServiceTests.cs to streamline test suite.
- Removed associated project file NLWebNet.Tests.MSTest.csproj.
- Cleaned up unused service tests: MockDataBackendTests.cs and QueryProcessorTests.cs.
- Removed TestLogger.cs as it is no longer needed.
@jongalloway jongalloway requested a review from Copilot June 21, 2025 03:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds enterprise-grade monitoring and observability to NLWebNet by incorporating health checks, metrics, rate limiting, structured logging, and OpenTelemetry support.

  • Introduces /health and /health/detailed minimal API endpoints with health check implementations
  • Registers middleware for rate limiting, metrics collection, and correlation ID propagation with structured logging
  • Provides OpenTelemetry integration, including exporters, resource configuration, and Aspire compatibility

Reviewed Changes

Copilot reviewed 30 out of 109 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/NLWebNet/Extensions/ServiceCollectionExtensions.cs Registers health checks, metrics & rate limiting services
src/NLWebNet/Middleware/RateLimitingMiddleware.cs Implements token bucket rate limiting & headers
src/NLWebNet/Middleware/MetricsMiddleware.cs Records request metrics & distributed tracing
src/NLWebNet/Middleware/NLWebMiddleware.cs Adds correlation IDs & structured logging
src/NLWebNet/Endpoints/HealthEndpoints.cs Defines /health and /health/detailed endpoints
src/NLWebNet/Utilities/CorrelationIdUtility.cs Utility for extracting and logging correlation IDs
src/NLWebNet/Extensions/OpenTelemetryExtensions.cs Configures OpenTelemetry integration and exporters
doc/monitoring-demo.md Added documentation for monitoring features
Comments suppressed due to low confidence (5)

src/NLWebNet/Endpoints/HealthEndpoints.cs:25

  • Add unit or integration tests for the basic and detailed health endpoints to ensure correct status codes and response shapes.
        app.MapGet("/health", GetBasicHealthAsync)

copilot-setup-steps.yml:15

  • [nitpick] The Aspire workload installation is duplicated later in the script; consolidate these steps to avoid redundancy and reduce CI setup time.
        # Still need to install Aspire workload if not present

src/NLWebNet/Utilities/CorrelationIdUtility.cs:1

  • Missing 'using System.Linq;' directive required for LINQ extension methods like FirstOrDefault in CreateLoggingProperties.
using Microsoft.AspNetCore.Http;

src/NLWebNet/Middleware/RateLimitingMiddleware.cs:62

  • Missing 'using System.Linq;' directive to enable FirstOrDefault extension on context.Request.Headers.
            var clientId = context.Request.Headers[_options.ClientIdHeader].FirstOrDefault();

src/NLWebNet/Endpoints/HealthEndpoints.cs:105

  • Missing 'using System.Linq;' directive required for ToDictionary and Any usage on healthReport.Entries.
                Entries = healthReport.Entries.ToDictionary(

Comment on lines 50 to 52
context.Response.Headers.Append("X-RateLimit-Limit", _options.RequestsPerWindow.ToString());
context.Response.Headers.Append("X-RateLimit-Remaining", status.RequestsRemaining.ToString());
context.Response.Headers.Append("X-RateLimit-Reset", ((int)status.WindowResetTime.TotalSeconds).ToString());
Copy link

Copilot AI Jun 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Use header assignment (e.g., context.Response.Headers["X-RateLimit-Limit"] = value) instead of Append to avoid duplicate headers on retries or multiple middleware runs.

Suggested change
context.Response.Headers.Append("X-RateLimit-Limit", _options.RequestsPerWindow.ToString());
context.Response.Headers.Append("X-RateLimit-Remaining", status.RequestsRemaining.ToString());
context.Response.Headers.Append("X-RateLimit-Reset", ((int)status.WindowResetTime.TotalSeconds).ToString());
context.Response.Headers["X-RateLimit-Limit"] = _options.RequestsPerWindow.ToString();
context.Response.Headers["X-RateLimit-Remaining"] = status.RequestsRemaining.ToString();
context.Response.Headers["X-RateLimit-Reset"] = ((int)status.WindowResetTime.TotalSeconds).ToString();

Copilot uses AI. Check for mistakes.
@jongalloway jongalloway merged commit 3e1bb4e into main Jun 21, 2025
6 checks passed
@jongalloway jongalloway deleted the copilot/fix-3 branch July 1, 2025 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

📊 Production-Ready Monitoring and Observability

2 participants