Skip to content

Production metrics infrastructure and distributed caching #579

@ptbrowne

Description

@ptbrowne

Context

Recent PRs (#558, #573, #574, #576) established GraphQL performance monitoring infrastructure:

  • Metrics collection via Apollo Server plugin
  • Sentry integration for persistence
  • Admin UI and API for accessing metrics
  • Automated PR performance tracking

Current metrics are visible in Sentry:
https://interactive-things.sentry.io/explore/traces/?field=id&field=span.name&field=span.description&field=span.duration&field=transaction&field=timestamp&field=cache_status&project=4509389251674112&query=has%3Acache_status&statsPeriod=24h

Next Steps

1. Enable metrics in production

  • Currently metrics are collected in preview/development environments
  • Production metrics would provide real-world performance data
  • Action: Configure METRICS_ENABLED=true and ADMIN_API_TOKEN in production RHOS environment

2. Distributed caching with Redis

Related to #373

Current Apollo Server cache is in-memory and not shared across RHOS instances. This means:

  • Cache is cold on every new instance/deployment
  • No cache sharing between regions/instances
  • Inefficient resource usage

Option A: BCP Infrastructure Redis (Memorystore)

  • Cost: ~12,000 CHF/year (35 CHF/day)
  • Pros: Officially supported infrastructure, integrated with BCP platform
  • Cons: High cost

Option B: Application-level Redis

  • Configure Redis ourselves in Kubernetes
  • No availability guarantee but it's not so important for a cache

Benefits of distributed cache:

  • Warm cache across all pods
  • Reduced database/API load
  • Faster response times
  • Better resource utilization

Investigation needed:

  • Analyze production metrics to quantify cache hit rates and performance impact
  • Estimate cost savings from reduced compute/database usage
  • Proof-of-concept implementation with application-level Redis

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions