Skip to content

Traces & metrics #124

@JulianFP

Description

@JulianFP

We already have a proper logging system in place that has file logging, configurable log levels, and configurable output formats. However, this could extended by some of the following proposals:

  1. Add Tracing using opentelemetry: This would allow us to track individual http requests throughout all stacks of the backend, including FastAPI, database, redis, smtp, authentication (ldap/oidc), httpx. Currently many of these stacks don't omit logs at all, and even if they do we don't have trace ids, meaning we cannot say which http request caused which log. opentelemetry also has great python packages that configure most of this automatically. Some challenges:
    • How to deal with performance regression and large volumes of logs? We cannot trace every single http request in production!
    • How to integrate with LDAP, SMTP, OIDC? opentelemtry has an httpx package (which partially covers OIDC), but if we want to include the others in the trace we would have to write something custom
    • Is this too overkill? I really love the idea of traces, but for it to become useful it would need a logging stack with Grafana and Tempo in production. I would only start working on this if we decide that what we already have is not enough (to find specific bugs/bottlenecks/etc)
  2. Expose usage metrics through a prometheus endpoint: This could be really useful if we want to track stuff like:
    • How many monthly active users (MAU) do we have?
    • How many transcription jobs are there each month?
    • What login providers to these users use?
    • How long does the average job take?
    • How high is the average load on the runners? (i.e. how often does it occur that all runners are processing a job at once)
    • Average/max amount of jobs in the job queue
    • ...

Usage statistics like these could help to justify a Project-W instance (find out if it is even needed, is there current demand), and to make informed decisions about it's future (should we keep the service, do we need more runners, ...)

Metadata

Metadata

Assignees

No one assigned

    Labels

    backendTouches backend functionalitymedium prioritywell be done at some pointrunnerTouches runner functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions