Cloud Run Observability Sandbox

A small Flask-based demo app to explore Cloud Run observability features (logging, tracing, metrics) and simple Firestore integration.

What this project is

A minimal Flask application demonstrating:
- JSON structured logging (using python-json-logger)
- Trace linkage via X-Cloud-Trace-Context header
- Simulated latency and CPU load endpoints for tracing/metrics
- Basic Firestore usage for sample user writes
Intended for running on Google Cloud Run (Dockerfile uses a builder + distroless runtime), but also runnable locally.

Key files

main.py — Flask app and all endpoints
requirements.txt — Python dependencies
Dockerfile — multi-stage build, runtime uses distroless and runs gunicorn as a module
.gitignore — ignores common Python artifacts

Environment

PROJECT_ID (optional) — Google Cloud project id used to initialize Firestore client
DATABASE_ID (optional) — Firestore database id (defaults to (default))

If PROJECT_ID is not provided, Firestore initialization will fail and the app continues with db = None (you can still use non-Firestore endpoints).

Quick start — local (dev)

Create and activate a virtual environment:
- python -m venv .venv
- source .venv/bin/activate (or .venv\Scripts\activate on Windows)
Install dependencies:
- pip install -r requirements.txt
Run:
- For simple dev run: python main.py (starts Flask builtin server on port 8080)
- To run with Gunicorn (recommended, matches production):
  - python -m gunicorn --bind :8080 --workers 1 --threads 8 main:app

Quick start — Docker (matches deployment)

Build:
- docker build -t cloud-run-observability-sandbox .
Run (example):
- docker run -p 8080:8080 -e PROJECT_ID=your-gcp-project cloud-run-observability-sandbox

The Dockerfile uses a builder stage to install packages into /app/site-packages and a distroless runtime image for a smaller surface area.

Endpoints

GET / — Health / welcome message
POST /users — Add a user to Firestore
- Expects JSON with id and name
GET /slow-trace — Simulates a slow dependency (sleeps, touches Firestore) — useful to create longer traces
GET /cpu-heavy — Computes some arbitrary CPU task — useful to observe CPU metrics
GET /flaky — Randomly returns 500 to simulate intermittent errors
GET /crash — Demonstrate a runtime error and how Error reporting records and keeps track of errors
GET /cached-config — Demonstrates cold vs warm start behaviour using a global cache

Observability notes

Logging:
- Uses python-json-logger to emit JSON logs to stdout.
- Logs include a trace_id field extracted from the X-Cloud-Trace-Context header so logs can be correlated with Cloud Trace.
- Logs also include a component field to filter by logical area (e.g., trace-demo, cpu-demo, cache, user-module).
Traces:
- The app reads X-Cloud-Trace-Context header and includes the trace id in log payloads to link logs and traces.
Metrics & Alerts:
- Use cpu-heavy and slow-trace to generate observable signals for metrics dashboards and alerting rules.

Optional: Service-to-service tracing (service_b integration)

Purpose
- If you want richer, linked traces for requests that span services (service A → service B) and more visibility into individual spans within each service, enable service-to-service trace context propagation and per-service instrumentation.
What this repo already provides (service_b example)
- service_b.py demonstrates per-service instrumentation:
  - FlaskInstrumentor().instrument_app(app) to capture incoming HTTP spans.
  - RequestsInstrumentor().instrument() to automatically inject/extract trace propagation headers on outgoing requests.
- setup_opentelemetry.py configures OTLP exporters for traces, logs, and metrics so spans and logs are exported to your collector/Cloud Trace.
How to enable better cross-service traces
- Ensure every service:
  - Initializes OpenTelemetry SDK (tracer provider and OTLP exporter) like setup_opentelemetry.py.
  - Instruments the web framework (e.g., Flask) to capture incoming spans.
  - Instruments the HTTP client (requests, httpx, etc.) to propagate context on outgoing calls so the downstream service receives the same trace id.
- In practice:
  - On the caller side, instrument the outgoing HTTP client (RequestsInstrumentor is used in service_b.py) — this will add the necessary headers for propagation.
  - On the callee side, instrument the web framework (FlaskInstrumentor) so incoming requests create child spans under the propagated trace.
- Manual propagation (optional):
  - If you need explicit control, read the incoming header (e.g., X-Cloud-Trace-Context) and forward it on outgoing requests:
    - Read: trace_header = request.headers.get("X-Cloud-Trace-Context")
    - Forward: requests.post(url, json=payload, headers={"X-Cloud-Trace-Context": trace_header})
  - Automatic instrumentation (RequestsInstrumentor) usually handles this for you and is recommended.
Single-service detailed spans and visibility
- To get more granular spans inside a service, use an explicit tracer and create spans around important operations:
  - Use the SDK tracer (via trace.get_tracer(__name__)) and with tracer.start_as_current_span("operation-name"): around database calls, business logic, or outgoing requests.
  - This gives you fine-grained spans within a service so Cloud Trace shows not only the request-level span but also sub-operations.
When to use this
- Enable service-to-service propagation when you want end-to-end traces across microservices.
- Add extra internal spans when you need deeper visibility into bottlenecks or want to surface internal operations in Cloud Trace.

Firestore

The app initializes a Firestore client at startup using google-cloud-firestore.
If initialization fails, db falls back to None and DB-related endpoints will skip database operations (but still run).
POST /users writes to the sampleUsers collection.

Notes for contributors

Keep changes small and focused: this repo is a demo to exercise observability features.
When adding endpoints, include structured logging with component and trace_id in extra for consistency.
Avoid adding secrets or service account keys to the repo. Use environment variables or Cloud Run/IAM service accounts for production credentials.

Troubleshooting

If Firestore calls fail locally, ensure application has GCP credentials (e.g., gcloud auth application-default login) or unset PROJECT_ID to run without DB.
For local tracing/log correlation, include a header like:
- X-Cloud-Trace-Context: TRACE_ID/123;o=1 where TRACE_ID is a hex trace id.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
service_b.Dockerfile		service_b.Dockerfile
service_b.py		service_b.py
service_b.requirements.txt		service_b.requirements.txt
setup_opentelemetry.py		setup_opentelemetry.py
setup_structured_logging.py		setup_structured_logging.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cloud Run Observability Sandbox

What this project is

Key files

Environment

Quick start — local (dev)

Quick start — Docker (matches deployment)

Endpoints

Observability notes

Firestore

Notes for contributors

Troubleshooting

License

About

Uh oh!

Languages

License

discoverlance-com/my-cloud-run-observability-sandbox

Folders and files

Latest commit

History

Repository files navigation

Cloud Run Observability Sandbox

What this project is

Key files

Environment

Quick start — local (dev)

Quick start — Docker (matches deployment)

Endpoints

Observability notes

Firestore

Notes for contributors

Troubleshooting

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages