Commit 492a8d6
authored
[CONTP-730] fix(kubelet_listener): Retrieve updated pod entity before updating services. (#45118)
### What does this PR do?
In a small subset of cases (~3% of the time), the Agent would log a `WARN` saying it could not create a file tailer for a container because its parent pod is missing.
```
2024-10-25 10:25:35 UTC | CORE | WARN | (pkg/logs/launchers/container/tailerfactory/factory.go:95 in makeTailer) | Could not make file tailer for source container_collect_all (falling back to socket): cannot find pod for container "6f585560fac9d45127f20509c7c84d017126776573772f42e1bd45af59090e54": "6f585560fac9d45127f20509c7c84d017126776573772f42e1bd45af59090e54" not found
```
The warning stems from the Kubelet AD subscribing to `SourceAll` entity events in WLM and short lived pods that have been terminated are then being sent as 'Set' events leading the Kubelet AD adding the pod and its since deleted containers as services, triggering a file tailer attempt.
The Kubelet listener now fetches the latest `workloadmeta.KubernetesPod` entity instead of using the provided entity to avoid adding container services for pod containers that have been deleted.
### Describe how you validated your changes
#### Reproduce the warning logs
1. Deploy Agent version <= 7.75 with container collect all
```
datadog:
logLevel: INFO
autoscaling:
workload:
enabled: false
operator:
enabled: false
clusterName: mathewe-log-tail
secretBackend:
command: "/readsecret_multiple_providers.sh"
kubelet:
tlsVerify: false
logs:
enabled: true
containerCollectAll: true
dogstatsd:
nonLocalTraffic: true
originDetection: true
useSocketVolume: true
tagCardinality: "high"
envDict:
DD_CHECKS_TAG_CARDINALITY: "high"
```
3. Install istio https://github.com/DataDog/sandbox/blob/c21fca7035e60951372fbd10cef921af810509a7/apm/kubernetes/Istio/python-flask/install.sh
4. Deploy test job workload
<details><summary>istio-sidecar-cronjob-test-repro.yaml</summary>
<p>
```
# Scheduled once per minute
apiVersion: batch/v1
kind: CronJob
metadata:
name: test-cronjob
namespace: test-istio
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
metadata:
annotations:
sidecar.istio.io/inject: "true"
spec:
containers:
- name: my-container
image: curlimages/curl
imagePullPolicy: Always
command: [ "/bin/sh", "-c", "--" ]
args: [ "for i in `seq 1 10` ; do sleep 1.0; echo `date` example stdout log $i; done; curl http://localhost:15000/quitquitquit -X POST" ]
restartPolicy: OnFailure
---
# Scheduled once per minute (offset 15s)
apiVersion: batch/v1
kind: CronJob
metadata:
name: test-cronjob-2
namespace: test-istio
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
metadata:
annotations:
sidecar.istio.io/inject: "true"
spec:
containers:
- name: my-container
image: curlimages/curl
imagePullPolicy: Always
command: [ "/bin/sh", "-c", "--" ]
args: [ "sleep 15; for i in `seq 1 10` ; do sleep 1.0; echo `date` example stdout log $i; done; curl http://localhost:15000/quitquitquit -X POST" ]
restartPolicy: OnFailure
---
# Scheduled once per minute (offset 30s)
apiVersion: batch/v1
kind: CronJob
metadata:
name: test-cronjob-3
namespace: test-istio
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
metadata:
annotations:
sidecar.istio.io/inject: "true"
spec:
containers:
- name: my-container
image: curlimages/curl
imagePullPolicy: Always
command: [ "/bin/sh", "-c", "--" ]
args: [ "sleep 30; for i in `seq 1 10` ; do sleep 1.0; echo `date` example stdout log $i; done; curl http://localhost:15000/quitquitquit -X POST" ]
restartPolicy: OnFailure
```
</p>
</details>
5. See warning logs after running for several minutes (it may take an hour or so).
<img width="1856" height="870" alt="image" src="https://github.com/user-attachments/assets/269148eb-e9e2-4e3a-bfe8-7e8f40e1dd6f" />
#### Deploy fixed agent.
1. Build & deploy fixed Agent.
```
agents:
image:
repository: "agent"
tag: "fix-3"
doNotCheckTag: true
```
2. See warning logs stop
<img width="1685" height="397" alt="image" src="https://github.com/user-attachments/assets/05756d29-f46a-4625-86c7-f5c1d09b6988" />
### Additional Notes
Co-authored-by: mathew.estafanous <mathew.estafanous@datadoghq.com>1 parent a436c68 commit 492a8d6
File tree
2 files changed
+16
-1
lines changed- comp/core/autodiscovery/listeners
- releasenotes/notes
2 files changed
+16
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
67 | | - | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
68 | 77 | | |
69 | 78 | | |
70 | 79 | | |
| |||
Lines changed: 6 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
0 commit comments