Skip to content

Feature/metrics phase1: per-asset logging + dashboards#581

Merged
PGijsbers merged 22 commits intodevelopfrom
feature/metrics-phase1
Aug 21, 2025
Merged

Feature/metrics phase1: per-asset logging + dashboards#581
PGijsbers merged 22 commits intodevelopfrom
feature/metrics-phase1

Conversation

@mardalla
Copy link
Collaborator

@mardalla mardalla commented Aug 6, 2025

Change(s)

Change Type: Added

Change Category: Internal / Monitoring

Changelog Entry:

Added Prometheus instrumentation, per-asset access logging, '/stats/top/{resource_type}' endpoint, and Grafana dashboards (metrics phase 1).

Details

  • Instrumentation – 'Instrumentator()' in 'src/main.py' exposes '/metrics' with 1 s scrape interval.
  • Monitoring stack – Prometheus & Grafana services added to 'docker-compose.yml'; minimal 'prometheus.yml' included.
  • Access logging – new SQLModel table 'AssetAccessLog'; logs every 2xx/4xx hit on any asset endpoint ('datasets', 'models', etc.).
  • Auto-addition of assets – middleware pulls valid resource types from 'resource_routers.router_list'; new routers will be logged without code changes.

How to Test

  1. docker compose up -d
  2. Open 'http://localhost:9090/targets' → API target is UP.
  3. Hit an asset
  4. GET /stats/top/datasets returns asset.
  5. docker exec -i apiserver python - <<'PY' from sqlmodel import Session, select from database.session import EngineSingleton from database.model.access.access_log import AssetAccessLog with Session(EngineSingleton().engine) as s: print(s.exec(select(AssetAccessLog).order_by(AssetAccessLog.id.desc()).limit(1)).first()) PY => row appears with asset_id.
  6. Grafana panels update in real time.

@mardalla mardalla requested a review from PGijsbers August 6, 2025 12:39
Copy link
Contributor

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there reasons for the permission changes in e.g., logstash files?

The docs/metrics.md file is empty.

Provided a number of pointers based on a quick look. I will try it out later, likely tomorrow.

Copy link
Contributor

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the PR by making sure it's merged with the current state of develop. I may have made a mistake there, because it only seems to partially work for me now:

  • All services start fine.
  • The middleware correctly logs to the database.
  • The /metrics endpoint is available.
  • The /stats/top/{} is not, e.g., /stats/top/datasets does not show me usage results but instead a 404, indicating the endpoint doesn't exist.

Could you please:

  • Update if necessary if I broke something with the merge commit (alternatively feel free to undo and bring it up to date yourself, but I assume it's something minor)
  • remove the change of permissions on the config files
  • add some unit tests to test the middleware layer

Additionally, I noticed that this was built off a pretty old version of the development branch. Nowadays we have version prefixes at the start of the URL, which means the middleware won't work when people access assets through a versioned api, e.g. /v2/datasets/data_123123123.... Would you be able to update the middleware to make sure the access is logged regardless of which version was used. I think we should probably introduce a separate middleware to log all requests to the server (though it's already in the logs), but in any case we can do this in a future PR.

In my case, grafana didn't come with any Dashboard. Perhaps it is possible to define a default dashboard and also add it to the repository, so we have something that works out of the box?

Finally, could you elaborate a bit more in the documentation? The documentation should be a reference that contributors can use to understand the project by themselves after the European project ends, and as it is, it is a bit terse. For example, which service should expose the /stats/top endpoint? I now tried all of them just to make sure cuz none of them worked for me. Why are we doing this? How are we doing is (small description of workflow). The note " Queries – PromQL for per-endpoint, MySQL for per-asset popularity." doesn't really say that much to me. PromQL? Where? Feel free to also link to existing documentation where relevant, e.g., the hosting/authentication.md file also links to the keycloak documentation pages where relevant.

Copy link
Contributor

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest changes were also checked by @mardalla

@PGijsbers PGijsbers merged commit 4f169ab into develop Aug 21, 2025
4 checks passed
@PGijsbers PGijsbers deleted the feature/metrics-phase1 branch August 21, 2025 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants