Skip to content

Conversation

@jpinsonneau
Copy link
Contributor

@jpinsonneau jpinsonneau commented Sep 22, 2025

What this PR does / why we need it:

This PR implements a new Openshift mode supporting both logging and network logs at same time. This is a trivial change despite the PR is quite large.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

You will need observatorium/opa-openshift#37 to test this PR.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@jpinsonneau jpinsonneau changed the title DRAFT openshift mode Openshift mode Sep 24, 2025
@jpinsonneau jpinsonneau changed the title Openshift mode feat: openshift mode to allow both logging and network Sep 24, 2025
@memodi
Copy link

memodi commented Dec 12, 2025

@jpinsonneau - I did a performance test on this and loki is able ingest both logging and network logs well, I am not seeing prominent log loss or any errors for loki.

I set up 5 worker nodes and for logging had 8 pods within each NS generating 90,000 lines of logs lines every 30 mins. I had 5 such namespaces, bringing total of 40 pods. I also added netobserv workload on top of it.

You can see below data for one of such ns which was generating logs at above mentioned rate, for most part if collected 720000 log lines (90,000 * 8 (number of pod replicas) starting at 30 mins mark. There was some loss in the trailing data but unsure if it was something that collector dropped or loki. I did not see Loki errors during the test.

$ curl --globoff -k -H "Authorization: Bearer $TOKEN" https://lokistack-netobserv-loki.apps.memodi-shared-loki.qe-lrc.devcluster.openshift.com/api/logs/v1/application/loki/api/v1/query_range --data-urlencode 'query=sum(count_over_time({log_type="application",kubernetes_namespace_name="log-gen-3"}[60m]))' --data-urlencode 'step=5m' | jq '.data.result[0].values'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3272    0  3139  100   133   6024    255 --:--:-- --:--:-- --:--:--  6268
[
  [
    1765508400,
    "286650"
  ],
  [
    1765508700,
    "406275"
  ],
  [
    1765509000,
    "525950"
  ],
  [
    1765509300,
    "645575"
  ],
  [
    1765509600,
    "720000"
  ],
  [
    1765509900,
    "720000"
  ],
  [
    1765510200,
    "720000"
  ],
  [
    1765510500,
    "720000"
  ],
  [
    1765510800,
    "720000"
  ],
  [
    1765511100,
    "720000"
  ],
  [
    1765511400,
    "672625"
  ],
  [
    1765511700,
    "553000"
  ],
  [
    1765512000,
    "433350"
  ]
]

@jpinsonneau
Copy link
Contributor Author

@jpinsonneau - I did a performance test on this and loki is able ingest both logging and network logs well, I am not seeing prominent log loss or any errors for loki.

I set up 5 worker nodes and for logging had 8 pods within each NS generating 90,000 lines of logs lines every 30 mins. I had 5 such namespaces, bringing total of 40 pods. I also added netobserv workload on top of it.

You can see below data for one of such ns which was generating logs at above mentioned rate, for most part if collected 720000 log lines (90,000 * 8 (number of pod replicas) starting at 30 mins mark. There was some loss in the trailing data but unsure if it was something that collector dropped or loki. I did not see Loki errors during the test.

$ curl --globoff -k -H "Authorization: Bearer $TOKEN" https://lokistack-netobserv-loki.apps.memodi-shared-loki.qe-lrc.devcluster.openshift.com/api/logs/v1/application/loki/api/v1/query_range --data-urlencode 'query=sum(count_over_time({log_type="application",kubernetes_namespace_name="log-gen-3"}[60m]))' --data-urlencode 'step=5m' | jq '.data.result[0].values'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3272    0  3139  100   133   6024    255 --:--:-- --:--:-- --:--:--  6268
[
  [
    1765508400,
    "286650"
  ],
  [
    1765508700,
    "406275"
  ],
  [
    1765509000,
    "525950"
  ],
  [
    1765509300,
    "645575"
  ],
  [
    1765509600,
    "720000"
  ],
  [
    1765509900,
    "720000"
  ],
  [
    1765510200,
    "720000"
  ],
  [
    1765510500,
    "720000"
  ],
  [
    1765510800,
    "720000"
  ],
  [
    1765511100,
    "720000"
  ],
  [
    1765511400,
    "672625"
  ],
  [
    1765511700,
    "553000"
  ],
  [
    1765512000,
    "433350"
  ]
]

Awesome, thanks for testing this @memodi !
Let's try to grab more feedback. I'll also rebase and fix tests here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants