feat(api): add observability baseline with structured logging and telemetry by raychrisgdp · Pull Request #6 · raychrisgdp/taskgenie

raychrisgdp · 2026-01-03T04:12:33Z

Summary

Issues & Goals:

Enable structured JSON logging for easier debugging and log correlation
Provide request correlation IDs for tracing requests across the system
Add telemetry endpoint for monitoring system health and database status
Implement log redaction to protect sensitive data (tokens, passwords, emails)

Implementation Highlights:

Structured Logging (backend/logging.py): New JSON formatter with context-aware request_id field, redaction filter for sensitive keys and email addresses, and dual output (stdout + rotating file handler)
Request Middleware (backend/middleware.py): Request logging middleware that generates/reuses correlation IDs, logs HTTP requests with method/path/status/duration, handles exceptions with error logging, and echoes X-Request-Id in response headers
Telemetry Endpoint (backend/api/v1/telemetry.py): New /api/v1/telemetry endpoint providing system health metrics (status, version, uptime), database connectivity and migration version, graceful degradation on errors, and optional metrics placeholders for future PRs
Configuration (backend/config.py): Added LOG_LEVEL, TELEMETRY_ENABLED, and LOG_FILE_PATH settings with helper methods for log level resolution and file path defaults
Integration (backend/main.py, backend/cli/main.py): Integrated logging setup in FastAPI lifespan and CLI startup, registered middleware and telemetry router conditionally based on settings
Documentation (docs/USER_GUIDE.md, README.md): Added comprehensive observability section covering environment variables, telemetry endpoint usage, and log file configuration

How to Test

Prerequisites:

Install dependencies (including dev dependencies for FastAPI/uvicorn):

# Install dev dependencies (recommended - uses Makefile)
# This will use 'uv sync' if uv.lock exists, otherwise 'uv pip install'
make dev

# Or manually (if uv.lock exists):
uv sync --extra dev

# Or manually (if no uv.lock):
uv venv && uv pip install --python .venv/bin/python -e ".[dev]"

# Verify uvicorn is installed in the project venv:
.venv/bin/python -c "import uvicorn; print(f'uvicorn version: {uvicorn.__version__}')"
# Should show: uvicorn version: 0.40.0 (or similar)

# Note: If dependencies change, update the lock file:
# uv lock  # Updates uv.lock with current pyproject.toml dependencies

Backend API Testing:

Start the API server:

# Start in background (recommended for testing):
uv run python -m backend.main > /tmp/taskgenie.log 2>&1 &

# Or start in foreground:
uv run python -m backend.main
# Or: uvicorn backend.main:app --reload

# Verify server started:
sleep 2 && tail -5 /tmp/taskgenie.log
# Should see: "Uvicorn running on http://127.0.0.1:8080"

Test structured logging:
- Make any API request (e.g., curl http://127.0.0.1:8080/health)
- Verify logs are JSON format with fields: timestamp, level, logger, message, request_id
- Check that request_id is present (UUID4 format) and echoed in response headers as X-Request-Id
- Verify logs are written to both stdout and ~/.taskgenie/logs/taskgenie.jsonl (or configured LOG_FILE_PATH)
Test request logging middleware:
- Make API requests with different methods: GET /health, GET /api/v1/tasks
- Verify each request logs an http_request event with fields: method, path, status, duration_ms
- Check that X-Request-Id header is present in all responses
- Test request ID reuse: curl -H "X-Request-Id: test-id-123" http://127.0.0.1:8080/health and verify the same ID is echoed back
- Test unsafe request IDs (too long or non-ASCII) are rejected and new UUIDs are generated
Test log redaction:
- Trigger logs containing sensitive data (e.g., authorization headers, email addresses)
- Verify sensitive keys (authorization, token, password, secret, cookie, email) are redacted as [redacted]
- Verify email addresses in string values are replaced with [redacted-email]
- Check that redaction works for nested dictionaries and lists
Test telemetry endpoint:
```
curl http://127.0.0.1:8080/api/v1/telemetry
```
- Verify response includes: status ("ok" or "degraded"), version, uptime_s, db.connected, db.migration_version
- Verify optional.event_queue_size and optional.agent_runs_active are present with null values
- Test degraded status: Temporarily break database connection and verify status="degraded" with error message in db.error
Test configuration:
- Set LOG_LEVEL=DEBUG and verify debug logs appear
- Set TELEMETRY_ENABLED=false and verify /api/v1/telemetry returns 404
- Set LOG_FILE_PATH=/tmp/test.jsonl and verify logs are written to custom path
- Verify DEBUG=true automatically sets log level to DEBUG

CLI Testing:

Test CLI logging:
```
uv run tgenie --help
```
- Verify CLI output includes structured JSON logs
- Check that request_id is null for CLI operations (not in request context)
- Verify logs are written to configured log file path

Expected Behavior:

All API requests generate structured JSON logs with correlation IDs
Request IDs are propagated via X-Request-Id header for tracing
Sensitive data is automatically redacted from logs
Telemetry endpoint provides system health metrics
Logs are written to both stdout and rotating file handler
Configuration respects environment variables and defaults

Related Issues

Implements PR-016: Observability Baseline

Author Checklist

Synced with latest main branch
Self-reviewed
All tests pass locally (27 tests passing)
Documentation updated (USER_GUIDE.md, README.md)
No breaking changes
Manual testing completed (server starts, endpoints work, structured logging verified)

Additional Notes

Key Implementation Areas for Review

Backend API:

backend/logging.py: JSON formatter implementation, redaction filter logic, context variable usage for request ID propagation
backend/middleware.py: Request ID generation/reuse logic, HTTP request logging, exception handling, response header injection
backend/api/v1/telemetry.py: Database health checks, migration version retrieval, graceful error handling
backend/config.py: New observability settings and helper methods for log level/file path resolution
backend/main.py: Logging setup in lifespan, middleware registration, conditional telemetry router registration

Testing:

tests/test_logging.py: Unit tests for JSON formatter, redaction filter, and logging setup
tests/test_middleware.py: Middleware tests for request ID handling, request logging, error logging
tests/api/test_telemetry.py: Integration tests for telemetry endpoint response shape and degraded status

Documentation:

docs/USER_GUIDE.md: Comprehensive observability section with environment variables and telemetry usage
README.md: Quick reference for observability features

Testing Notes

✅ Manual Testing Completed: Server starts successfully, all endpoints respond correctly
✅ Structured Logging: JSON logs verified with proper fields (timestamp, level, logger, message, request_id, event, method, path, status, duration_ms)
✅ Request ID Propagation: Auto-generated UUIDs and custom ID reuse both working correctly
✅ Telemetry Endpoint: Returns correct JSON with status, version, uptime_s, db.connected, db.migration_version
✅ Log File: Created at ~/.taskgenie/logs/taskgenie.jsonl with proper JSON formatting
Log file rotation: Verify logs rotate when file exceeds 10MB (5 backup files retained) - Not tested manually
Request ID context: Verify request_id is properly scoped per request and doesn't leak between requests - Verified via manual testing
Redaction edge cases: Test redaction with various sensitive data patterns (nested structures, list values) - Unit tests cover this
Telemetry degraded mode: Test telemetry endpoint behavior when database is unavailable or migration table missing - Unit tests cover this
Configuration precedence: Verify environment variables override defaults correctly - Unit tests cover this

- Set DEBUG to false in .env.example for production readiness. - Removed unnecessary database and LLM configuration options from .env.example. - Updated the Typer dependency in pyproject.toml and uv.lock to remove the 'all' extra, simplifying the installation process. - Improved developer quickstart instructions for installing dependencies and running the application. - Enhanced PR-002 task CRUD API documentation with additional details on response shapes and pagination. These changes aim to streamline configuration, clarify setup instructions, and improve API documentation.

- Added a new API v1 for task management, including endpoints for creating, retrieving, updating, and deleting tasks. - Introduced task schemas for request validation and response formatting. - Implemented error handling for task not found scenarios with a standardized error response. - Updated the Makefile to include precommit checks in the test coverage command. - Removed linting step from CI workflow to streamline the testing process. These changes enhance the API functionality for task management and improve error handling, contributing to a more robust application.

- Add structured JSON logging with redaction filter - Implement request correlation IDs via middleware - Add telemetry endpoint with DB health and migration version - Add comprehensive test coverage (27 tests) - Update PR-016 spec with implementation details

- Replace magic values with constants (HTTP_OK, UUID_LENGTH, etc.) - Move imports to top level - Remove unused imports - Fix PLR2004 and PLC0415 violations

- Add model_validator to TaskUpdate to reject title: null (prevents DB integrity errors) - Fix async generator return type annotation in test fixture - Add noqa comment for magic number in pagination test - Add test for null title rejection Fixes CI/CD issues: mypy errors and ruff warnings

feat(api): implement task CRUD API endpoints

- Add structured JSON logging with redaction filter - Implement request correlation IDs via middleware - Add telemetry endpoint with DB health and migration version - Add comprehensive test coverage (27 tests) - Update PR-016 spec with implementation details

- Replace magic values with constants (HTTP_OK, UUID_LENGTH, etc.) - Move imports to top level - Remove unused imports - Fix PLR2004 and PLC0415 violations

- Set logger level explicitly for backend.middleware logger - Use caplog.at_level() with specific logger name to capture logs - Fixes test failures where logs weren't being captured

- Set logger.propagate = True to ensure logs reach root logger - Set root logger level explicitly for caplog capture - Fixes test failures after rebase onto main

- Configure logger at module level to ensure logs are captured - Set logger levels to DEBUG in test functions for better isolation - Ensures tests pass when run individually or with PR-016 test suite - Note: test isolation issue persists when running full test suite in parallel

- Merge tasks router and telemetry router in main.py - Keep logger configuration in test_middleware.py - Keep type annotation in api/v1/__init__.py

…cution

…dler - Use custom LogCaptureHandler instead of caplog for better isolation - Set propagate=False to avoid interference from setup_logging() - Ensure logger is enabled and configured right before request - Add verification checks to ensure logger is properly configured

- Added details on structured logging and telemetry configuration. - Updated logging section with environment variables and examples. - Included information about the telemetry endpoint and its usage. - Clarified request_id handling in logging format.

- Added 'make lock' target to update the uv.lock file after modifying pyproject.toml. - Enhanced 'make dev' and 'make install-all' to use 'uv sync' if uv.lock exists, ensuring consistent dependency installation. - Updated CI workflow to reflect changes in dependency installation logic.

raychrisgdp added 3 commits January 3, 2026 01:13

raychrisgdp self-assigned this Jan 3, 2026

raychrisgdp marked this pull request as draft January 3, 2026 04:12

raychrisgdp added 15 commits January 3, 2026 11:15

fix: resolve lint errors in test files

75d65b0

- Replace magic values with constants (HTTP_OK, UUID_LENGTH, etc.) - Move imports to top level - Remove unused imports - Fix PLR2004 and PLC0415 violations

style: apply ruff formatting fixes

9e32515

Merge pull request #5 from raychrisgdp/pr-002-task-crud-api

081f3a0

feat(api): implement task CRUD API endpoints

fix: resolve lint errors in test files

63520eb

- Replace magic values with constants (HTTP_OK, UUID_LENGTH, etc.) - Move imports to top level - Remove unused imports - Fix PLR2004 and PLC0415 violations

style: apply ruff formatting fixes

af7d7fb

fix: ensure logger level is set in middleware tests

f520b30

- Set logger level explicitly for backend.middleware logger - Use caplog.at_level() with specific logger name to capture logs - Fixes test failures where logs weren't being captured

fix: ensure logger propagation in middleware tests

464f2ce

- Set logger.propagate = True to ensure logs reach root logger - Set root logger level explicitly for caplog capture - Fixes test failures after rebase onto main

merge: resolve conflicts with main branch

ffedd64

- Merge tasks router and telemetry router in main.py - Keep logger configuration in test_middleware.py - Keep type annotation in api/v1/__init__.py

style: apply ruff formatting fixes

7ccb0b2

test: use custom handler for middleware log tests to fix parallel exe…

8af84be

…cution

raychrisgdp changed the title ~~feat: implement PR-016 observability baseline~~ feat(api): add observability baseline with structured logging and telemetry Jan 3, 2026

raychrisgdp marked this pull request as ready for review January 3, 2026 08:20

raychrisgdp merged commit b399897 into main Jan 3, 2026
2 checks passed

raychrisgdp deleted the feature/observability-baseline branch January 3, 2026 08:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): add observability baseline with structured logging and telemetry#6

feat(api): add observability baseline with structured logging and telemetry#6
raychrisgdp merged 19 commits intomainfrom
feature/observability-baseline

raychrisgdp commented Jan 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raychrisgdp commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How to Test

Related Issues

Author Checklist

Additional Notes

Key Implementation Areas for Review

Testing Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

raychrisgdp commented Jan 3, 2026 •

edited

Loading