Skip to content

Conversation

jpinsonneau
Copy link
Contributor

@jpinsonneau jpinsonneau commented Apr 2, 2025

Description

Create the console plugin when FlowCollector doesn't exists to expose the new forms.

Suggested alternatives: #1346 & #1374

See netobserv/network-observability-console-plugin#763 for the forms implementations

Dependencies

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Comment on lines 81 to 100
// force reconcile at startup
go r.InitReconcile(ctx)

return nil
}

func (r *FlowCollectorReconciler) InitReconcile(ctx context.Context) error {
log := log.FromContext(ctx)
log.Info("Initializing resources...")

var err error
for attempt := range initReconcileAttempts {
// delay the reconcile calls to let some time to the cache to load
time.Sleep(5 * time.Second)
_, err = r.Reconcile(ctx, reconcile.Request{})
if err != nil {
log.Error(err, "Error while doing initial reconcile", "attempt", attempt)
} else {
break
}
}
return err
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☝️ I wonder if there is an out of box mechanism to trigger the loop after the cache loaded. That's why I'm using a sleep here and this will may work in all situations.

https://redhat-internal.slack.com/archives/C02939DP5L5/p1743518264445429

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, when a reconcile loop is failing, you can also return a time value to reschedule the reconciliation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I gave a try with that without success.

I'm refactoring the code again to move the static content to another controller wich will be cleaner I guess. I will give another try with the reschedule time on the new controller 👍

@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 2, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 2, 2025
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 2, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 2, 2025
@netobserv netobserv deleted a comment from github-actions bot Apr 14, 2025
@netobserv netobserv deleted a comment from github-actions bot Apr 14, 2025
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 14, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 14, 2025
@jpinsonneau jpinsonneau force-pushed the 1942 branch 2 times, most recently from c4c57e1 to dbc4cbf Compare April 15, 2025 10:40
@netobserv netobserv deleted a comment from github-actions bot Apr 15, 2025
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 15, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 15, 2025
@netobserv netobserv deleted a comment from github-actions bot Apr 15, 2025
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 15, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 15, 2025
@netobserv netobserv deleted a comment from github-actions bot Apr 15, 2025
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 15, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 17, 2025
Comment on lines 38 to 61
// force reconcile at startup
go r.InitReconcile(ctx)

return ctrl.NewControllerManagedBy(mgr).
For(&flowslatest.FlowCollector{}, reconcilers.IgnoreStatusChange).
Named("staticPlugin").
Complete(&r)
}

func (r *Reconciler) InitReconcile(ctx context.Context) {
log := log.FromContext(ctx)
log.Info("Initializing resources...")

for attempt := range initReconcileAttempts {
// delay the reconcile calls to let some time to the cache to load
time.Sleep(5 * time.Second)
_, err := r.Reconcile(ctx, ctrl.Request{})
if err != nil {
log.Error(err, "Error while doing initial reconcile", "attempt", attempt)
} else {
return
}
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OlivierCazade I modified the PR to be in a dedicated controller.

As you can see, I force the Reconcile call during the Start function so OLM can't interpret the result containing Requeue / RequeueAfter here.

Since we don't create a dedicated CR for static plugin, I don't think we can rely on these here.

WDYT ?

@netobserv netobserv deleted a comment from github-actions bot Apr 17, 2025
@jpinsonneau
Copy link
Contributor Author

Rebased without changes

@memodi
Copy link
Member

memodi commented Jul 15, 2025

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 15, 2025
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:cdd8f5a
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-cdd8f5a
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-cdd8f5a

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:cdd8f5a make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-cdd8f5a

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-cdd8f5a
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@memodi
Copy link
Member

memodi commented Jul 15, 2025

@jpinsonneau Here's quick feedback from my initial review of the form view:

- ebpf flow filtering
	- what does option 1 and 2 mean for port fields? [1]
- Deployment model
- Mode:
	- add a note which Loki mode is recommended for production use cases. 
- Prometheus Mode:
	- add a node what "Auto" mode mean? Or hide it if its not relevant 
- pipeline stage:
	- flow filtering.
	- some advanced configuration like "secondaryNetworks" could be exposed.

[1]

Screenshot 2025-07-15 at 4 50 19 PM

at the end it generated empty yaml [2] , I was trying with 4.20 OCP version

Screenshot 2025-07-15 at 4 58 34 PM

I'll add more feedback as I test more.

Comment on lines +1262 to +1265
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
Copy link
Member

@memodi memodi Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpinsonneau - can we append this env var instead of adding it in middle so that index # for the existing vars doesn't change? We have several scripts and CI steps where we patch csv which happens based on index of env var.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I cant put it in the end of the list if it help 😸

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@memodi
Copy link
Member

memodi commented Jul 17, 2025

Adding more feedback:

  1. A way to skip through all the config options would be nice for users who are looking to just configure with default config.
  2. Is there a way to have a hyperlink of a diagram presented on [k8s/cluster/flows.netobserv.iov1beta2FlowCollector/status] of each component for the "status" field shown on [flows.netobserv.iov1beta2FlowCollector] page
  3. the view port of the diagram is very small on my 16" laptop screen
Screenshot 2025-07-16 at 10 12 46 PM
  1. When I used kafka mode, the FLP component never went green even though it was working fine.
  2. nit: the lines between FLP --> Loki/Prom. keeps flickering.
Screen.Recording.2025-07-16.at.10.28.05.PM.mov

@jpinsonneau
Copy link
Contributor Author

@jpinsonneau Here's quick feedback from my initial review of the form view:

- ebpf flow filtering
	- what does option 1 and 2 mean for port fields? [1]
- Deployment model
- Mode:
	- add a note which Loki mode is recommended for production use cases. 
- Prometheus Mode:
	- add a node what "Auto" mode mean? Or hide it if its not relevant 
- pipeline stage:
	- flow filtering.
	- some advanced configuration like "secondaryNetworks" could be exposed.

Sure I can add some notes and recommendations on top of the current descriptions. These are coming from the API so maybe we can also improve it.

About the options 1 / 2 and the advanced config, we need to decide what to expose and what to skip as these needs extra tweaking to make it work.
For example, options 1 & 2 are about using a single port or an array or range:
https://github.com/netobserv/network-observability-operator/blob/main/internal/controller/ebpf/agent_controller.go#L560-L572

If we decide to not expose these, I will simply hide these fields.

@jpinsonneau
Copy link
Contributor Author

  1. A way to skip through all the config options would be nice for users who are looking to just configure with default config.

You can directly click on the left menu to navigate to the last step. Isn't it enough ?

  1. Is there a way to have a hyperlink of a diagram presented on [k8s/cluster/flows.netobserv.iov1beta2FlowCollector/status] of each component for the "status" field shown on [flows.netobserv.iov1beta2FlowCollector] page

Sure, good idea !

  1. the view port of the diagram is very small on my 16" laptop screen

Indeed. Would you prefer to have a scrollbar on the status list or a scrollbar on the entire page ?

  1. When I used kafka mode, the FLP component never went green even though it was working fine.

Indeed I should rely on the WaitingFLPTransformer for kafka:
https://github.com/netobserv/network-observability-console-plugin/pull/763/files
I'll do the change

  1. nit: the lines between FLP --> Loki/Prom. keeps flickering.

I can see if I can skip some refreshs here to avoid that. Thanks for your feedback !

@memodi
Copy link
Member

memodi commented Jul 17, 2025

You can directly click on the left menu to navigate to the last step. Isn't it enough ?

ah, it wasn't clear me, that I could jump that way.

Would you prefer to have a scrollbar on the status list or a scrollbar on the entire page ?

I think scrollbar on status list make sense to me.

About the options 1 / 2 and the advanced config, we need to decide what to expose and what to skip as these needs extra tweaking to make it work.
For example, options 1 & 2 are about using a single port or an array or range:
https://github.com/netobserv/network-observability-operator/blob/main/internal/controller/ebpf/agent_controller.go#L560-L572
If we decide to not expose these, I will simply hide these fields.

I think these are much advanced config which can be handled directly in yaml.

of all the advancedConfig, IMHO secondaryNetworks may probably be more utilized than others, may be it make a case to move it out of advanced config in API for it.

@memodi
Copy link
Member

memodi commented Jul 17, 2025

/label needs-changes

Copy link

openshift-ci bot commented Jul 17, 2025

@memodi: The label(s) /label needs-changes cannot be applied. These labels are supported: acknowledge-critical-fixes-only, platform/aws, platform/azure, platform/baremetal, platform/google, platform/libvirt, platform/openstack, ga, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, px-approved, docs-approved, qe-approved, ux-approved, no-qe, downstream-change-needed, rebase/manual, cluster-config-api-changed, run-integration-tests, approved, backport-risk-assessed, bugzilla/valid-bug, cherry-pick-approved, jira/valid-bug, stability-fix-approved, staff-eng-approved. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/label needs-changes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@memodi
Copy link
Member

memodi commented Jul 17, 2025

/needs-changes

@memodi memodi added the needs-changes To be added to denote PR needs changes or some questions/comments to be addressed label Jul 17, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 18, 2025
@jpinsonneau
Copy link
Contributor Author

@memodi coming back on the notes, that's what we have today:
image
image

I would suggest to update the CRD first and then in the forms if we are not satisfied with those texts

@memodi
Copy link
Member

memodi commented Jul 18, 2025

I would suggest to update the CRD first and then in the forms if we are not satisfied with those texts

sure, np, though it would be nice to have a way to override API docs text

@Amoghrd
Copy link
Member

Amoghrd commented Jul 22, 2025

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 22, 2025
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:02a5af9
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-02a5af9
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-02a5af9

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:02a5af9 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-02a5af9

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-02a5af9
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

Copy link

openshift-ci bot commented Jul 22, 2025

@jpinsonneau: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-operator 9bb32d1 link false /test e2e-operator

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jpinsonneau jpinsonneau removed the needs-changes To be added to denote PR needs changes or some questions/comments to be addressed label Jul 23, 2025
Copy link
Member

@jotak jotak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Jul 23, 2025
@jotak
Copy link
Member

jotak commented Jul 24, 2025

another round of testing will be done post-merge; merging now!

@jotak jotak merged commit 5c7af20 into main Jul 24, 2025
13 of 15 checks passed
@jpinsonneau jpinsonneau deleted the 1942 branch October 1, 2025 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm ok-to-test To set manually when a PR is safe to test. Triggers image build on PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants