Skip to content

Commit 1c124c0

Browse files
author
Mario Macias
authored
NETOBSERV-307 & NETOBSERV-308: changed example deployments + documentation (#23)
* Move memlock removal to initialization + extra documentation to work with eBPF * Documented individual capabilities instead of privileged
1 parent 178ef04 commit 1c124c0

File tree

9 files changed

+241
-48
lines changed

9 files changed

+241
-48
lines changed

README.md

Lines changed: 83 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,14 @@
33
The Network Observability eBPF Agent allows collecting and aggregating all the ingress and
44
egress flows on a Linux host (required a Kernel 4.18+ with eBPF enabled).
55

6+
* [How to compile](#how-to-compile)
7+
* [Hot to configure](#how-to-configure)
8+
* [How to run](#how-to-run)
9+
* [Development receipts](#development-receipts)
10+
* [Known issues](#known-issues)
11+
* [Frequently-asked questions](#frequently-asked-questions)
12+
* [Troubleshooting](#troubleshooting)
13+
614
## How to compile
715

816
```
@@ -19,24 +27,56 @@ The eBPF Agent is configured by means of environment variables. Check the
1927
The NetObserv eBPF Agent is designed to run as a DaemonSet in OpenShift/K8s. It is triggered and
2028
configured by our [Network Observability Operator](https://github.com/netobserv/network-observability-operator).
2129

22-
Anyway you can run it directly as an executable with administrative privileges:
30+
Anyway you can run it directly as an executable from your command line:
2331

2432
```
2533
export FLOWS_TARGET_HOST=...
2634
export FLOWS_TARGET_PORT=...
2735
sudo -E bin/netobserv-ebpf-agent
2836
```
37+
2938
To deploy locally, use instructions from [flowlogs-dump (like tcpdump)](./examples/flowlogs-dump/README.md).
30-
To deploy it as a Pod, you can check the [deployment example](./examples/performance/deployment.yml).
39+
To deploy it as a Pod, you can check the [deployment examples](./deployments).
40+
41+
The Agent needs to be executed either with:
42+
43+
1. The following [Linux capabilities](https://man7.org/linux/man-pages/man7/capabilities.7.html)
44+
(recommended way): `BPF`, `PERFMON`, `NET_ADMIN`, `SYS_RESOURCE`. If you
45+
[deploy it in Kubernetes or OpenShift](./deployments/flp-daemonset-cap.yml),
46+
the container running the Agent needs to define the following `securityContext`:
47+
```yaml
48+
securityContext:
49+
runAsUser: 0
50+
capabilities:
51+
add:
52+
- BPF
53+
- PERFMON
54+
- NET_ADMIN
55+
- SYS_RESOURCE
56+
```
57+
(Please notice that the `runAsUser: 0` is still needed).
58+
2. Administrative privileges. If you
59+
[deploy it in Kubernetes or OpenShift](./deployments/flp-daemonset.yml),
60+
the container running the Agent needs to define the following `securityContext`:
61+
```yaml
62+
securityContext:
63+
privileged: true
64+
runAsUser: 0
65+
```
66+
This option is only recommended if your Kernel does not recognize some of the above capabilities.
67+
We found some Kubernetes distributions (e.g. K3s) that do not recognize the `BPF` and
68+
`PERFMON` capabilities.
69+
70+
Here is a list of distributions where we tested both full privileges and capability approaches,
71+
and whether they worked (✅) or did not (❌):
72+
73+
| Distribution | K8s Server version | Capabilities | Privileged |
74+
|-------------------------------|--------------------|--------------|------------|
75+
| Amazon EKS (Bottlerocket AMI) | 1.22.6 | ✅ | ✅ |
76+
| K3s (Rancher Desktop) | 1.23.5 | ❌ | ✅ |
77+
| Kind | 1.23.5 | ❌ | ✅ |
78+
| OpenShift | 1.23.3 | ✅ | ✅ |
3179

32-
## Where is the collector?
33-
34-
As part of our Network Observability solution, the eBPF Agent is designed to send the traced
35-
flows to our [Flowlogs Pipeline](https://github.com/netobserv/flowlogs-pipeline) component.
36-
37-
In addition, we provide a simple GRPC+Protobuf library to allow implementing your own collector.
38-
Check the [packet counter code](./examples/performance/server/packet-counter-collector.go)
39-
for an example of a simple collector using our library.
4080

4181
## Development receipts
4282

@@ -62,7 +102,38 @@ Tested in Fedora 35 and Red Hat Enterprise Linux 8.
62102

63103
## Known issues
64104

65-
## Extrenal Traffic in Openshift (OVN-Kubernetes CNI)
105+
### Extrenal Traffic in Openshift (OVN-Kubernetes CNI)
66106

67107
For egress traffic, you can see the source Pod metadata. For ingress traffic (e.g. an HTTP response),
68-
you see the destination **Host** metadata.
108+
you see the destination **Host** metadata.
109+
110+
## Frequently-asked questions
111+
112+
### Where is the collector?
113+
114+
As part of our Network Observability solution, the eBPF Agent is designed to send the traced
115+
flows to our [Flowlogs Pipeline](https://github.com/netobserv/flowlogs-pipeline) component.
116+
117+
In addition, we provide a simple GRPC+Protobuf library to allow implementing your own collector.
118+
Check the [packet counter code](./examples/performance/server/packet-counter-collector.go)
119+
for an example of a simple collector using our library.
120+
121+
## Troubleshooting
122+
123+
### Deployed as a Kubernetes Pod, the agent shows permission errors in the logs and can't start
124+
125+
In your [deployment file](./deployments/flp-daemonset-cap.yml), make sure that the container runs as
126+
the root user (`runAsUser: 0`) and with the granted capabilities or privileges (see [how to run](#how-to-run) section).
127+
128+
### The Agent doesn't work in my Amazon EKS puzzle
129+
130+
Despite Amazon Linux 2 enables eBPF by default in EC2, the
131+
[EKS images are shipped with disabled eBPF](https://github.com/awslabs/amazon-eks-ami/issues/728).
132+
133+
You'd need either:
134+
135+
1. Provide your own AMI configured to work with eBPF
136+
2. Use other Linux distributions that are shipped with eBPF enabled by default. We have successfully
137+
tested the eBPF Agent in EKS with the [Bottlerocket](https://aws.amazon.com/es/bottlerocket/)
138+
Linux distribution, without requiring any extra configuration.
139+

deployments/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,7 @@ but the files contained here are useful for documentation and manual testing.
66

77
* `flp-daemonset.yml`, shows how to deploy/configure the Agent when Flowlogs Pipeline is deployed
88
as daemonset, taking the target host configuration from the Host IP.
9+
* `flp-daemonset-cap.yml`, same as `flp-daemonset.yml`, but assigning individual capabilities instead
10+
of deploying a fully-privileged container.
911
* `flp-service.yml`, shows how to deploy/configure the Agent when Flowlogs Pipeline is deployed
1012
as a service, explicitly setting the host configuration as the service name.

deployments/flp-daemonset-cap.yml

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# Example deployment for manual testing with flp
2+
# It requires loki to be installed
3+
apiVersion: apps/v1
4+
kind: DaemonSet
5+
metadata:
6+
name: netobserv-ebpf-agent
7+
labels:
8+
k8s-app: netobserv-ebpf-agent
9+
spec:
10+
selector:
11+
matchLabels:
12+
k8s-app: netobserv-ebpf-agent
13+
template:
14+
metadata:
15+
labels:
16+
k8s-app: netobserv-ebpf-agent
17+
spec:
18+
serviceAccountName: netobserv-account
19+
hostNetwork: true
20+
dnsPolicy: ClusterFirstWithHostNet
21+
containers:
22+
- name: netobserv-ebpf-agent
23+
image: quay.io/mmaciasl/netobserv-ebpf-agent:main
24+
# imagePullPolicy: Always
25+
securityContext:
26+
capabilities:
27+
add:
28+
- BPF
29+
- PERFMON
30+
- NET_ADMIN
31+
- SYS_RESOURCE
32+
runAsUser: 0
33+
env:
34+
- name: FLOWS_TARGET_HOST
35+
valueFrom:
36+
fieldRef:
37+
fieldPath: status.hostIP
38+
- name: FLOWS_TARGET_PORT
39+
value: "9999"
40+
---
41+
apiVersion: apps/v1
42+
kind: DaemonSet
43+
metadata:
44+
name: flp
45+
labels:
46+
k8s-app: flp
47+
spec:
48+
selector:
49+
matchLabels:
50+
k8s-app: flp
51+
template:
52+
metadata:
53+
labels:
54+
k8s-app: flp
55+
spec:
56+
containers:
57+
- name: flowlogs-pipeline
58+
image: quay.io/netobserv/flowlogs-pipeline:latest
59+
ports:
60+
- containerPort: 9999
61+
args:
62+
- --config=/etc/flp/config.yaml
63+
volumeMounts:
64+
- mountPath: /etc/flp
65+
name: config-volume
66+
volumes:
67+
- name: config-volume
68+
configMap:
69+
name: flp-config
70+
---
71+
apiVersion: v1
72+
kind: ConfigMap
73+
metadata:
74+
name: flp-config
75+
data:
76+
config.yaml: |
77+
log-level: debug
78+
pipeline:
79+
- name: ingest
80+
- name: decode
81+
follows: ingest
82+
- name: enrich
83+
follows: decode
84+
- name: encode
85+
follows: enrich
86+
- name: loki
87+
follows: encode
88+
parameters:
89+
- name: ingest
90+
ingest:
91+
type: grpc
92+
grpc:
93+
port: 9999
94+
- name: decode
95+
decode:
96+
type: protobuf
97+
- name: enrich
98+
transform:
99+
type: network
100+
network:
101+
rules:
102+
- input: SrcAddr
103+
output: SrcK8S
104+
type: "add_kubernetes"
105+
- input: DstAddr
106+
output: DstK8S
107+
type: "add_kubernetes"
108+
- name: encode
109+
encode:
110+
type: none
111+
- name: loki
112+
write:
113+
type: loki
114+
loki:
115+
type: loki
116+
staticLabels:
117+
app: netobserv-flowcollector
118+
labels:
119+
- "SrcK8S_Namespace"
120+
- "SrcK8S_OwnerName"
121+
- "DstK8S_Namespace"
122+
- "DstK8S_OwnerName"
123+
- "FlowDirection"
124+
url: http://loki:3100
125+
timestampLabel: TimeFlowEnd
126+
---
127+
apiVersion: v1
128+
kind: ServiceAccount
129+
metadata:
130+
name: netobserv-account
131+

deployments/flp-daemonset.yml

Lines changed: 2 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ spec:
2424
# imagePullPolicy: Always
2525
securityContext:
2626
privileged: true
27+
runAsUser: 0
2728
env:
2829
- name: FLOWS_TARGET_HOST
2930
valueFrom:
@@ -124,18 +125,4 @@ apiVersion: v1
124125
kind: ServiceAccount
125126
metadata:
126127
name: netobserv-account
127-
---
128-
apiVersion: security.openshift.io/v1
129-
kind: SecurityContextConstraints
130-
metadata:
131-
name: example
132-
allowPrivilegedContainer: true
133-
allowHostDirVolumePlugin: true
134-
allowHostNetwork: true
135-
allowHostPorts: true
136-
runAsUser:
137-
type: RunAsAny
138-
seLinuxContext:
139-
type: RunAsAny
140-
users:
141-
- system:serviceaccount:network-observability:netobserv-account
128+

deployments/flp-service.yml

Lines changed: 1 addition & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ spec:
2424
# imagePullPolicy: Always
2525
securityContext:
2626
privileged: true
27+
runAsUser: 0
2728
env:
2829
- name: FLOWS_TARGET_HOST
2930
value: "flp"
@@ -138,18 +139,3 @@ apiVersion: v1
138139
kind: ServiceAccount
139140
metadata:
140141
name: netobserv-account
141-
---
142-
apiVersion: security.openshift.io/v1
143-
kind: SecurityContextConstraints
144-
metadata:
145-
name: example
146-
allowPrivilegedContainer: true
147-
allowHostDirVolumePlugin: true
148-
allowHostNetwork: true
149-
allowHostPorts: true
150-
runAsUser:
151-
type: RunAsAny
152-
seLinuxContext:
153-
type: RunAsAny
154-
users:
155-
- system:serviceaccount:network-observability:netobserv-account

pkg/agent/agent.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,8 @@ func FlowsAgent(cfg *Config) (*Flows, error) {
9898
func (f *Flows) Run(ctx context.Context) error {
9999
alog.Info("starting Flows agent")
100100

101+
systemSetup()
102+
101103
tracedRecords, err := f.interfacesManager(ctx)
102104
if err != nil {
103105
return err

pkg/agent/agent_darwin.go

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
package agent
2+
3+
func systemSetup() {
4+
}

pkg/agent/agent_linux.go

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
package agent
2+
3+
import (
4+
"github.com/cilium/ebpf/rlimit"
5+
"github.com/sirupsen/logrus"
6+
)
7+
8+
var slog = logrus.WithField("component", "systemSetup")
9+
10+
// systemSetup holds some system-dependant initialization processes
11+
func systemSetup() {
12+
if err := rlimit.RemoveMemlock(); err != nil {
13+
slog.WithError(err).
14+
Warn("can't remove mem lock. The agent could not be able to start eBPF programs")
15+
}
16+
}

pkg/ebpf/tracer.go

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@ import (
1111
"time"
1212

1313
"github.com/cilium/ebpf/ringbuf"
14-
"github.com/cilium/ebpf/rlimit"
1514
"github.com/netobserv/netobserv-ebpf-agent/pkg/flow"
1615
"github.com/sirupsen/logrus"
1716
"github.com/vishvananda/netlink"
@@ -53,11 +52,6 @@ func NewFlowTracer(iface string, sampling uint32) *FlowTracer {
5352
// before exiting.
5453
func (m *FlowTracer) Register() error {
5554
ilog := log.WithField("iface", m.interfaceName)
56-
// Allow the current process to lock memory for eBPF resources.
57-
// TODO: manually invoke unix.Prlimit with lower/reasonable rlimit
58-
if err := rlimit.RemoveMemlock(); err != nil {
59-
return fmt.Errorf("removing mem lock: %w", err)
60-
}
6155
// Load pre-compiled programs and maps into the kernel, and rewrites the configuration
6256
spec, err := loadBpf()
6357
if err != nil {

0 commit comments

Comments
 (0)