We already have a proper logging system in place that has file logging, configurable log levels, and configurable output formats. However, this could extended by some of the following proposals:
- Add Tracing using opentelemetry: This would allow us to track individual http requests throughout all stacks of the backend, including FastAPI, database, redis, smtp, authentication (ldap/oidc), httpx. Currently many of these stacks don't omit logs at all, and even if they do we don't have trace ids, meaning we cannot say which http request caused which log. opentelemetry also has great python packages that configure most of this automatically. Some challenges:
- How to deal with performance regression and large volumes of logs? We cannot trace every single http request in production!
- How to integrate with LDAP, SMTP, OIDC? opentelemtry has an httpx package (which partially covers OIDC), but if we want to include the others in the trace we would have to write something custom
- Is this too overkill? I really love the idea of traces, but for it to become useful it would need a logging stack with Grafana and Tempo in production. I would only start working on this if we decide that what we already have is not enough (to find specific bugs/bottlenecks/etc)
- Expose usage metrics through a prometheus endpoint: This could be really useful if we want to track stuff like:
- How many monthly active users (MAU) do we have?
- How many transcription jobs are there each month?
- What login providers to these users use?
- How long does the average job take?
- How high is the average load on the runners? (i.e. how often does it occur that all runners are processing a job at once)
- Average/max amount of jobs in the job queue
- ...
Usage statistics like these could help to justify a Project-W instance (find out if it is even needed, is there current demand), and to make informed decisions about it's future (should we keep the service, do we need more runners, ...)
We already have a proper logging system in place that has file logging, configurable log levels, and configurable output formats. However, this could extended by some of the following proposals:
Usage statistics like these could help to justify a Project-W instance (find out if it is even needed, is there current demand), and to make informed decisions about it's future (should we keep the service, do we need more runners, ...)