Skip to content

Commit e585349

Browse files
olegbetobetsun
andauthored
feat(KONFLUX-8225): add log forwarder for KubeArchive (#6439)
* feat(KONFLUX-8225): add log forwarder for KubeArchive Signed-off-by: obetsun <[email protected]> * feat(KONFLUX-8225): multiple fixes of synchronization issues rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix kube-linter errors rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix: correct PodDisruptionBudget configuration for Loki development - Move podDisruptionBudget from global to singleBinary component - Fixes ArgoCD sync issue with vector-kubearchive-log-collector-loki PDB rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix: add missing inject-infra-deployments-repo-details to kubearchive - Adds inject-infra-deployments-repo-details k-component to kubearchive ApplicationSet - Makes kubearchive point to user's fork and branch consistently - Fixes ArgoCD sync issue with kubearchive applications rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix: comprehensive kube-linter fixes for vector-kubearchive-log-collector - Fix loki-sc-rules sidecar container resource requirements - Fix exporter container resources for chunks/results cache StatefulSets - Add unhealthyPodEvictionPolicy to all PodDisruptionBudgets - Properly disable chunksCache and configure resultsCache - Add resource requirements for all memcached exporter containers - Apply fixes to both development and staging environments Resolves all 16 remaining kube-linter errors: - unset-cpu-requirements for loki-sc-rules and exporter containers - unset-memory-requirements for loki-sc-rules and exporter containers - pdb-unhealthy-pod-eviction-policy for memcached PDBs rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix: resolve kustomize deprecation warnings and helm chart conflicts for vector-kubearchive-log-collector - Add .gitignore files to exclude charts/ directories that cause build conflicts - Fix deprecated kustomize syntax: - patchesStrategicMerge → patches - patchesJson6902 → patches - commonLabels → labels - Resolve helm chart extraction conflicts that prevented policy checks - Update all kustomization.yaml files in vector-kubearchive-log-collector directory This fixes the 'forbid cluster policies' errors caused by: 1. Conflicting chart files during helm pull operations 2. Deprecated kustomize field usage triggering warnings rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix: simplify Loki configuration to resolve remaining kube-linter errors - Disable sidecar rules container completely to fix resource requirement errors - Disable all memcached components (chunks, results, frontend, etc.) - Remove problematic exporter containers and PDB configurations - Simplify to essential singleBinary configuration with proper resources - Keep only gateway and core Loki components with required resource limits This eliminates the final 10 kube-linter errors: - unset-cpu-requirements for loki-sc-rules and exporter containers - unset-memory-requirements for loki-sc-rules and exporter containers - pdb-unhealthy-pod-eviction-policy for memcached PodDisruptionBudgets Loki will still function for log collection and storage, just without advanced caching and rule management features. rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix: add required storage configuration to resolve Loki Helm template errors - Add loki.storage.bucketNames configuration for development (filesystem) and staging (s3) - Add loki.storage_config to properly configure storage backends - Add path_prefix and working_directory configurations - Fix template error: 'nil pointer evaluating interface {}.chunks' This resolves the kubectl kustomize --enable-helm failure: Error: template: loki/templates/_helpers.tpl:231:19: executing "loki.commonStorageConfig" at <$.Values.loki.storage.bucketNames.chunks>: nil pointer evaluating interface {}.chunks The Loki Helm chart requires proper storage configuration even in simplified mode. rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix: remove init container causing busybox image pull failures - Remove init-vector-data-dir init container that was using busybox:1.35 - The init container was causing 'Back-off pulling image busybox:1.35' errors - EmptyDir volumes are created automatically, no init container needed - Fixes ArgoCD out-of-sync issues caused by failed image pulls The directory /vector-data-dir will be created automatically when the emptyDir volume is mounted, eliminating the need for an init container. rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix: resolve Loki gateway nginx container failures in development environment - Add explicit nginx image specification (docker.io/nginx:1.25-alpine) - Configure DNS resolver for OpenShift environment (dns-default.openshift-dns.svc.cluster.local) - Add nginx timeouts and proxy configuration for stability - Set proper security context for OpenShift compatibility This resolves the 'Back-off restarting failed container nginx' error in the vector-kubearchive-log-collector-loki-gateway pod by ensuring nginx has: 1. Proper DNS resolution for upstream services 2. Appropriate timeouts for proxy connections 3. Security context that works with OpenShift constraints The gateway will now successfully proxy requests to Loki components. rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix: correct DNS configuration for OpenShift environment and improve SCC - Replace RKE2-specific DNS with proper OpenShift DNS services: - dnsNamespace: openshift-dns (was kube-system) - dnsService: dns-default (was rke2-coredns-rke2-coredns) - resolver: dns-default.openshift-dns.svc.cluster.local - Fix ClusterRoleBinding reference from 'vector-scc-user' to 'kubearchive-vector-scc-user' - Improve SCC security by dropping ALL capabilities instead of individual ones - Add explicit Loki service account configuration to match SCC setup - Apply fixes to both development and staging environments This ensures proper DNS resolution for nginx gateway in OpenShift and resolves SCC permission issues for all vector-kubearchive-log-collector components. rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix the external secret path rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix: add missing service accounts for Loki gateway and canary - Add vector-kubearchive-log-collector-loki ServiceAccount - Add vector-kubearchive-log-collector-loki-canary ServiceAccount - These service accounts were referenced in SCC and ClusterRoleBinding but not actually created - Fixes 'serviceaccount not found' error when creating Loki gateway pods Error was: pods 'vector-kubearchive-log-collector-loki-gateway-*' is forbidden: error looking up service account product-kubearchive-logging/vector-kubearchive-log-collector-loki: serviceaccount 'vector-kubearchive-log-collector-loki' not found rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix: remove duplicate docker.io prefix from nginx image repository in development - Change nginx image repository from 'docker.io/nginx' to 'nginx' - Fixes image pull error: 'docker.io/docker.io/nginx:1.25-alpine' - The Loki Helm chart was adding docker.io/ prefix automatically, causing duplication - Staging configuration was already correct Error was: Failed to pull image 'docker.io/docker.io/nginx:1.25-alpine': reading manifest 1.25-alpine in docker.io/docker.io/nginx: unauthorized rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix nginx image repo for loki gateway rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix external secret for staging configuration rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * loki configuration simplified rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix for loki disabled components linter errors rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix readonly loki storage filesystem and scc naming rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED * fix tmp storage rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED --------- Signed-off-by: obetsun <[email protected]> Co-authored-by: obetsun <[email protected]>
1 parent 1ebbb27 commit e585349

28 files changed

+1075
-5
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,6 @@ cosign.pub
55
tmp
66
.idea/*
77
components/pipeline-service/base/log-collector/charts/*
8+
9+
# Ignore cached Helm charts
10+
components/**/charts/

argo-cd-apps/base/member/infra-deployments/kubearchive/kustomization.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,5 @@ kind: Kustomization
44
resources:
55
- kubearchive.yaml
66
components:
7+
- ../../../../k-components/inject-infra-deployments-repo-details
78
- ../../../../k-components/deploy-to-member-cluster-merge-generator

argo-cd-apps/base/member/infra-deployments/kustomization.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ resources:
3030
- konflux-ui
3131
- konflux-rbac
3232
- konflux-info
33+
- vector-kubearchive-log-collector
3334
- vector-tekton-logs-collector
3435
- kyverno
3536
- namespace-lister
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
apiVersion: kustomize.config.k8s.io/v1beta1
2+
kind: Kustomization
3+
resources:
4+
- vector-kubearchive-log-collector.yaml
5+
components:
6+
- ../../../../k-components/inject-infra-deployments-repo-details
7+
- ../../../../k-components/deploy-to-member-cluster-merge-generator
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
apiVersion: argoproj.io/v1alpha1
2+
kind: ApplicationSet
3+
metadata:
4+
name: vector-kubearchive-log-collector
5+
spec:
6+
generators:
7+
- merge:
8+
mergeKeys:
9+
- nameNormalized
10+
generators:
11+
- clusters:
12+
values:
13+
sourceRoot: components/vector-kubearchive-log-collector
14+
environment: staging
15+
clusterDir: base
16+
- list:
17+
elements:
18+
- nameNormalized: stone-stg-rh01
19+
values.clusterDir: stone-stg-rh01
20+
template:
21+
metadata:
22+
name: vector-kubearchive-log-collector-{{nameNormalized}}
23+
annotations:
24+
argocd.argoproj.io/sync-wave: "100"
25+
argocd.argoproj.io/refresh: "hard"
26+
spec:
27+
ignoreDifferences:
28+
# Ignore Helm-generated dynamic content that causes drift
29+
- group: apps
30+
kind: Deployment
31+
name: vector-kubearchive-log-collector-grafana
32+
jsonPointers:
33+
- /metadata/annotations/deployment.kubernetes.io~1revision
34+
- /metadata/annotations/kubectl.kubernetes.io~1last-applied-configuration
35+
- /metadata/generation
36+
- /spec/template/metadata/annotations/checksum~1config
37+
- /spec/template/metadata/annotations/checksum~1secret
38+
- /spec/template/metadata/annotations/checksum~1sc-dashboard-provider-config
39+
- group: ""
40+
kind: Secret
41+
name: vector-kubearchive-log-collector-grafana
42+
jsonPointers:
43+
- /metadata/annotations/kubectl.kubernetes.io~1last-applied-configuration
44+
project: default
45+
source:
46+
path: '{{values.sourceRoot}}/{{values.environment}}/{{values.clusterDir}}'
47+
repoURL: https://github.com/olegbet/infra-deployments.git
48+
targetRevision: KONFLUX-8225_add_log_forwarder_for_loki
49+
destination:
50+
namespace: product-kubearchive-logging
51+
server: '{{server}}'
52+
syncPolicy:
53+
automated:
54+
prune: true
55+
selfHeal: true
56+
syncOptions:
57+
- CreateNamespace=true
58+
retry:
59+
limit: -1
60+
backoff:
61+
duration: 10s
62+
factor: 2
63+
maxDuration: 3m

argo-cd-apps/overlays/development/kustomization.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,11 @@ patches:
179179
kind: ApplicationSet
180180
version: v1alpha1
181181
name: crossplane-control-plane
182+
- path: development-overlay-patch.yaml
183+
target:
184+
kind: ApplicationSet
185+
version: v1alpha1
186+
name: vector-kubearchive-log-collector
182187
- path: development-overlay-patch.yaml
183188
target:
184189
kind: ApplicationSet
@@ -219,3 +224,4 @@ patches:
219224
kind: ApplicationSet
220225
version: v1alpha1
221226
name: konflux-kite
227+
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Ignore Helm charts directory created during kustomize build
2+
charts/
3+
*.tgz
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
apiVersion: kustomize.config.k8s.io/v1beta1
2+
kind: Kustomization
3+
namespace: product-kubearchive-logging
4+
commonAnnotations:
5+
argocd.argoproj.io/sync-wave: "-1"
6+
ignore-check.kube-linter.io/drop-net-raw-capability: |
7+
"Vector Runs requires access to socket."
8+
ignore-check.kube-linter.io/run-as-non-root: |
9+
"Vector Runs as Root and attach host Path."
10+
ignore-check.kube-linter.io/sensitive-host-mounts: |
11+
"Vector Runs requires certain host mounts to watch files being created by pods."
12+
13+
generators:
14+
- vector-helm-generator.yaml
15+
16+
resources:
17+
- vector-pre.yaml
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
apiVersion: builtin
2+
kind: HelmChartInflationGenerator
3+
metadata:
4+
name: vector
5+
name: vector
6+
repo: https://helm.vector.dev
7+
version: 0.43.0
8+
releaseName: vector-kubearchive-log-collector
9+
namespace: product-kubearchive-logging
10+
valuesFile: vector-helm-values.yaml
Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
---
2+
role: Agent
3+
resources:
4+
requests:
5+
cpu: 200m
6+
memory: 1024Mi
7+
limits:
8+
cpu: 1000m
9+
memory: 2048Mi
10+
customConfig:
11+
data_dir: /vector-data-dir
12+
api:
13+
enabled: true
14+
address: 127.0.0.1:8686
15+
playground: false
16+
sources:
17+
k8s_logs:
18+
type: kubernetes_logs
19+
rotate_wait_secs: 5
20+
glob_minimum_cooldown_ms: 500
21+
max_line_bytes: 3145728
22+
auto_partial_merge: true
23+
transforms:
24+
reduce_events:
25+
type: reduce
26+
inputs:
27+
- k8s_logs
28+
group_by:
29+
- file
30+
flush_period_ms: 2000
31+
end_every_period_ms: 2000
32+
merge_strategies:
33+
message: concat_newline
34+
remap_app_logs:
35+
type: remap
36+
inputs:
37+
- reduce_events
38+
source: |-
39+
.tmp = del(.)
40+
# Preserve original kubernetes fields for Loki labels
41+
if exists(.tmp.kubernetes.pod_uid) {
42+
.pod_id = del(.tmp.kubernetes.pod_uid)
43+
} else {
44+
.pod_id = "unknown_pod_id"
45+
}
46+
if exists(.tmp.kubernetes.pod_name) {
47+
.pod_name = del(.tmp.kubernetes.pod_name)
48+
} else {
49+
.pod_name = "unknown_pod"
50+
}
51+
if exists(.tmp.kubernetes.container_name) {
52+
.container = del(.tmp.kubernetes.container_name)
53+
} else {
54+
.container = "unknown_container"
55+
}
56+
if exists(.tmp.kubernetes.pod_namespace) {
57+
.namespace = del(.tmp.kubernetes.pod_namespace)
58+
} else {
59+
.namespace = "unlabeled"
60+
}
61+
# Handling Tekton-specific labels
62+
if exists(.tmp.kubernetes.pod_labels."tekton.dev/taskRunUID") {
63+
.taskRunUID = del(.tmp.kubernetes.pod_labels."tekton.dev/taskRunUID")
64+
} else {
65+
.taskRunUID = "none"
66+
}
67+
if exists(.tmp.kubernetes.pod_labels."tekton.dev/pipelineRunUID") {
68+
.pipelineRunUID = del(.tmp.kubernetes.pod_labels."tekton.dev/pipelineRunUID")
69+
.result = .pipelineRunUID
70+
} else {
71+
.result = .taskRunUID
72+
}
73+
# --- Start: Cronjob Specific Handling ---
74+
if exists(.tmp.kubernetes.pod_labels."job-name") {
75+
.job_name = del(.tmp.kubernetes.pod_labels."job-name")
76+
.log_type = "cronjob"
77+
if exists(.tmp.kubernetes.pod_labels."cronjob-name") {
78+
.cronjob_name = del(.tmp.kubernetes.pod_labels."cronjob-name")
79+
} else {
80+
# Using corrected regex pattern without \d
81+
.job_name = to_string(.job_name) ?? "default"
82+
if match(.job_name, r'^(.*)-[0-9]{8,10}$') {
83+
.cronjob_name = replace(.job_name, r'-[0-9]{8,10}$', "")
84+
} else {
85+
.cronjob_name = "unknown_cronjob"
86+
}
87+
}
88+
if exists(.tmp.kubernetes.pod_labels."controller-uid") {
89+
.job_uid = del(.tmp.kubernetes.pod_labels."controller-uid")
90+
}
91+
} else {
92+
.log_type = "application"
93+
}
94+
# --- End: Cronjob Specific Handling ---
95+
# Handling general Kubernetes labels
96+
if exists(.tmp.kubernetes.pod_labels) {
97+
.pod_labels = .tmp.kubernetes.pod_labels
98+
} else {
99+
.pod_labels = "no_labels"
100+
}
101+
# General message field handling
102+
if exists(.tmp.message) {
103+
.message = to_string(del(.tmp.message)) ?? "no_message"
104+
} else {
105+
.message = "no_message"
106+
}
107+
# Basic data sanitization to prevent 400 errors
108+
# Truncate very long messages
109+
if length(.message) > 32768 {
110+
.message = slice!(.message, 0, 32768) + "...[TRUNCATED]"
111+
}
112+
# Clean up temporary fields
113+
del(.tmp)
114+
sinks:
115+
loki:
116+
type: loki
117+
inputs: ["remap_app_logs"]
118+
# Direct connection to Loki service (no gateway)
119+
endpoint: "http://vector-kubearchive-log-collector-loki.product-kubearchive-logging.svc.cluster.local:3100"
120+
encoding:
121+
codec: "json"
122+
auth:
123+
strategy: "basic"
124+
user: "${LOKI_USERNAME}"
125+
password: "${LOKI_PASSWORD}"
126+
tenant_id: "kubearchive"
127+
request:
128+
headers:
129+
X-Scope-OrgID: kubearchive
130+
batch:
131+
max_bytes: 10485760
132+
timeout_secs: 300
133+
compression: "none"
134+
labels:
135+
job: "vector"
136+
pod_id: "{{`{{ pod_id }}`}}"
137+
container: "{{`{{ container }}`}}"
138+
namespace: "{{`{{ namespace }}`}}"
139+
pod: "{{`{{ pod_name }}`}}"
140+
buffer:
141+
type: "memory"
142+
max_events: 10000
143+
when_full: "block"
144+
env:
145+
- name: LOKI_USERNAME
146+
valueFrom:
147+
secretKeyRef:
148+
name: kubearchive-loki
149+
key: USERNAME
150+
- name: LOKI_PASSWORD
151+
valueFrom:
152+
secretKeyRef:
153+
name: kubearchive-loki
154+
key: PASSWORD
155+
nodeSelector:
156+
konflux-ci.dev/workload: konflux-tenants
157+
tolerations:
158+
- effect: NoSchedule
159+
key: konflux-ci.dev/workload
160+
operator: Equal
161+
value: konflux-tenants
162+
image:
163+
repository: quay.io/kubearchive/vector
164+
tag: 0.46.1-distroless-libc
165+
serviceAccount:
166+
create: true
167+
name: vector-kubearchive-log-collector
168+
securityContext:
169+
allowPrivilegeEscalation: false
170+
runAsUser: 0
171+
capabilities:
172+
drop:
173+
- CHOWN
174+
- DAC_OVERRIDE
175+
- FOWNER
176+
- FSETID
177+
- KILL
178+
- NET_BIND_SERVICE
179+
- SETGID
180+
- SETPCAP
181+
- SETUID
182+
readOnlyRootFilesystem: true
183+
seLinuxOptions:
184+
type: spc_t
185+
seccompProfile:
186+
type: RuntimeDefault
187+
188+
# Override default volumes to be more specific and secure
189+
extraVolumes:
190+
- name: varlog
191+
hostPath:
192+
path: /var/log/pods
193+
type: Directory
194+
- name: varlibdockercontainers
195+
hostPath:
196+
path: /var/lib/containers
197+
type: DirectoryOrCreate
198+
199+
extraVolumeMounts:
200+
- name: varlog
201+
mountPath: /var/log/pods
202+
readOnly: true
203+
- name: varlibdockercontainers
204+
mountPath: /var/lib/containers
205+
readOnly: true
206+
207+
# Configure Vector to use emptyDir for its default data volume instead of hostPath
208+
persistence:
209+
enabled: false
210+
211+

0 commit comments

Comments
 (0)