Skip to content

Conversation

@OlivierCazade
Copy link
Contributor

Description

Add Lokistack status to the plugin configmap.

The Lokistack is also now watched to update any status change.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Nov 13, 2025

@OlivierCazade: This pull request references NETOBSERV-2402 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add Lokistack status to the plugin configmap.

The Lokistack is also now watched to update any status change.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@codecov
Copy link

codecov bot commented Nov 14, 2025

Codecov Report

❌ Patch coverage is 73.68421% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.19%. Comparing base (e03f03d) to head (ac940f9).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...ntroller/consoleplugin/consoleplugin_reconciler.go 63.15% 4 Missing and 3 partials ⚠️
internal/controller/flowcollector_controller.go 0.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2142      +/-   ##
==========================================
- Coverage   72.40%   72.19%   -0.22%     
==========================================
  Files          93       93              
  Lines       10346    10379      +33     
==========================================
+ Hits         7491     7493       +2     
- Misses       2389     2414      +25     
- Partials      466      472       +6     
Flag Coverage Δ
unittests 72.19% <73.68%> (-0.22%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
internal/controller/consoleplugin/config/config.go 75.00% <ø> (ø)
.../controller/consoleplugin/consoleplugin_objects.go 86.66% <100.00%> (+0.36%) ⬆️
internal/pkg/cluster/cluster.go 81.21% <100.00%> (+0.80%) ⬆️
internal/controller/flowcollector_controller.go 76.53% <0.00%> (-2.42%) ⬇️
...ntroller/consoleplugin/consoleplugin_reconciler.go 71.52% <63.15%> (-2.01%) ⬇️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@jotak jotak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

}

if mgr.ClusterInfo.HasLokiStack() {
builder.Watches(&lokiv1.LokiStack{}, &handler.EnqueueRequestForObject{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally, we would enqueue only when our configured lokistack is affected, not all lokistacks
but I guess that's fine, as we don't expect hundreds of lokistacks out there :-)

Copy link
Member

@memodi memodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @OlivierCazade - adding review comments from Claude, I focused on must-fix and high impact fixes including better error reporting.

Besides the comments, it also pointed out missing test coverage on the operator side:

  Missing in Operator:
  - Test for LokiStack status embedding in configmap
  - Test for namespace defaulting logic
  - Test for behavior when LokiStack is not found
  - Test for behavior when LokiStack CRD is not available

Comment on lines +73 to +70
if mgr.ClusterInfo.HasLokiStack() {
builder.Watches(&lokiv1.LokiStack{}, &handler.EnqueueRequestForObject{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from Claude review:

  Problem: This will create reconcile requests for LokiStack objects, not FlowCollectors. When a LokiStack named "logging-loki" changes, it will try to reconcile a FlowCollector named "logging-loki", which likely doesn't exist.

  Fix Required:
  builder.Watches(&lokiv1.LokiStack{}, handler.EnqueueRequestsFromMapFunc(
      func(ctx context.Context, obj client.Object) []reconcile.Request {
          lokiStack := obj.(*lokiv1.LokiStack)
          var flowCollectors flowslatest.FlowCollectorList
          if err := mgr.GetClient().List(ctx, &flowCollectors); err != nil {
              log.FromContext(ctx).Error(err, "Failed to list FlowCollectors")
              return []reconcile.Request{}
          }

          var requests []reconcile.Request
          for _, fc := range flowCollectors.Items {
              if fc.Spec.Loki.Mode == flowslatest.LokiModeLokiStack &&
                 fc.Spec.Loki.LokiStack.Name == lokiStack.Name {
                  ns := fc.Spec.Loki.LokiStack.Namespace
                  if ns == "" {
                      ns = fc.Namespace
                  }
                  if ns == lokiStack.Namespace {
                      requests = append(requests, reconcile.Request{
                          NamespacedName: types.NamespacedName{
                              Name:      fc.Name,
                              Namespace: fc.Namespace,
                          },
                      })
                  }
              }
          }
          return requests
      },
  ))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not think about making a k8s query inside the handler to filter Lokistacks.

The approach I thought was to wait for the flowcollector to be created and start a dedicated controller with a static flowcollector name. This was adding a lot of complexity and I was not sure if it was worth it.

This looks like a more simple solution, to the price of a k8s querry in the handler function.

@jotak what do you think ?

Copy link
Member

@jotak jotak Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need to overcomplicate things here. It's basically my comment here: #2142 (comment) ; having a couple of false-positive reconcile events is not so important; we're talking about lokistack objects, it's not expected to have many, and they don't change often.

btw I think claude answer is wrong the enqueue request is not for a flow-collector named after the loki stack, it's for any flow-collector? (EnqueueRequestForObject{} with empty params)

Copy link
Member

@jotak jotak Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really want to narrow down to our configured lokistack, in other situation we just keep in controller state the last-time seen element that we want to check (we do that in a couple of places for flowcollector.spec.namespace iirc); we could do the same with the configured lokistack.
Like this: https://github.com/netobserv/network-observability-operator/blob/main/internal/controller/flp/flp_controller.go#L60

Comment on lines 192 to 193
lokiStack = nil
log.FromContext(ctx).Info("Could not get the LokiStack resource.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from Claude review:

  Problems:
  - Logs at Info level instead of Warning
  - Actual error is discarded
  - Can't distinguish between "not found" and "permission denied"
  - Should surface in FlowCollector status

  Fix:
  if err := r.Client.Get(ctx, types.NamespacedName{Name: desired.Loki.LokiStack.Name, Namespace: ns}, lokiStack); err != nil {
      lokiStack = nil
      if apierrors.IsNotFound(err) {
          log.FromContext(ctx).Info("LokiStack resource not found, status will not be available",
              "name", desired.Loki.LokiStack.Name,
              "namespace", ns)
      } else {
          log.FromContext(ctx).Error(err, "Failed to get LokiStack resource",
              "name", desired.Loki.LokiStack.Name,
              "namespace", ns)
      }
      // TODO: Consider surfacing this in FlowCollector status
  }

@openshift-ci
Copy link

openshift-ci bot commented Dec 23, 2025

New changes are detected. LGTM label has been removed.

@openshift-ci
Copy link

openshift-ci bot commented Dec 23, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from jotak. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@OlivierCazade OlivierCazade force-pushed the loki-status branch 2 times, most recently from 51f5209 to 0439130 Compare December 24, 2025 15:56
@OlivierCazade
Copy link
Contributor Author

/retest

return nil
}

func getLokiStatus(lokiStack *lokiv1.LokiStack) string {
Copy link
Member

@jotak jotak Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not tested, but looking at the code I think there's a problem here:
In reconcileConfigMap, the lokiStack object is created and can have the following values:

  • a reference to the lokistack that was found
  • nil if an error occurred
  • a reference to an empty (0-value) struct &lokiv1.LokiStack{} when not in LokiStack mode

So in this last situation, the returned value would be pending, which seems incorrect?

IMO, what we could do:

  • if there was an error when fetching LokiStack, set this error as status (e.g. the console plugin could display something like "LokiStack not found")
  • if the lokistack shows a non-ready status (error/pending condition), set a message with that condition status
  • if the loki stack is ready, set as ready
  • if not in lokistack mode, set as an empty string

Also, I'm not sure it's useful to check for the presence of the LokiStack API: if it's configured in LokiStack mode BUT the API is not present, the config is wrong, so it's ok to just display the error message that would come up when trying to fetch LokiStack?
wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, we should not enter the case of not being in LokiStack, but since we are passing LokiStack as a pointer I addded a not nil check. I modified the function to make this more understandable.

When not in LokiStack mode, the operator fallback to the previous mode, using the StatusURL.

If the lokistack shows a non-ready status, finding the right error might be tricky, each LokiStack component has its own status meaning we may display the wrong error. IMO this is simpler to display a pending status and let the user investigate, the LokiStack was provided by the user.

I removed the LokiStack API check.

@OlivierCazade
Copy link
Contributor Author

/retest

@OlivierCazade OlivierCazade requested a review from jotak January 22, 2026 16:00
@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 23, 2026
@github-actions
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:efa5a04
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-efa5a04
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-efa5a04

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:efa5a04 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-efa5a04

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-efa5a04
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

Comment on lines +184 to +186
lokiStack := &lokiv1.LokiStack{}
if desired.Loki.Mode == flowslatest.LokiModeLokiStack {
ns := desired.Loki.LokiStack.Namespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just double-checked trying your PR, and confirm it provides a wrong status when not in LokiStack mode (it says pending) ; that's because you initialize it as non-nil.

This should work I think:

Suggested change
lokiStack := &lokiv1.LokiStack{}
if desired.Loki.Mode == flowslatest.LokiModeLokiStack {
ns := desired.Loki.LokiStack.Namespace
var lokiStack *lokiv1.LokiStack
if desired.Loki.Mode == flowslatest.LokiModeLokiStack {
lokiStack = &lokiv1.LokiStack{}
ns := desired.Loki.LokiStack.Namespace

@jotak jotak self-requested a review January 23, 2026 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference ok-to-test To set manually when a PR is safe to test. Triggers image build on PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants