Skip to content

Collection rules "Unexpected timeout from process 1. Process will no longer be monitored" using 8.1.1 egress to Kubernetes #8404

@Atif-Syed-1

Description

@Atif-Syed-1

Description

  • We are using dotnet monitor to collect dump from our .NET application version 8.0.403 hosted on Azure Kubernetes. The manual dumps collection works using scripts. We are having issues running collection rules, dumps collection works but due to the process timeout resetting the limits. When we run a collection rule with defined limits for each collection , we see the timeout error, it resets the counter for the collection rules creating too many dumps.
  • Use the configuration rules in config section and the timeout occurs regularly. This resets the limits counter for collection , collection too many dumps
  • Timeout should not occur and it should not reset the counter.

Configuration

  • Is this related to a specific tool?
    Collection rules
  • What OS and version, and what distro if applicable?
    Kuberenetes is Linux based containers, we are using using MS images for .NET application
  • What is the architecture (x64, x86, ARM, ARM64)?
    x86
  • Do you know whether it is specific to that configuration?
    Timeout only happens when using configuration rules
  • Are you running in any particular type of environment? (e.g. Containers, a cloud scenario, app you are trying to target is a different user)
    Azure Kuberentes Service 1.31.9
    .NET application 8.0.403
    Sidecar using dotnet monitor image mcr.microsoft.com/dotnet/monitor:8.1.1
    Our application is using this base image : mcr.microsoft.com/dotnet/aspnet:8.0-alpine

kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "45"
meta.helm.sh/release-name: slot-8-application-umbrella
meta.helm.sh/release-namespace: slot-application
creationTimestamp: "2025-01-16T04:44:21Z"
generation: 58
labels:
admission.datadoghq.com/enabled: "true"
app.kubernetes.io/managed-by: Helm
name: slot8-application-web
tags.datadoghq.com/env: slot-8
tags.datadoghq.com/service: application-web
tags.datadoghq.com/version: 2.81.0
name: slot8-application-web
namespace: slot-application
resourceVersion: "591555309"
uid: redacted
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
name: slot8-application-web
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
kubectl.kubernetes.io/restartedAt: "2025-07-10T15:23:17-04:00"
creationTimestamp: null
labels:
admission.datadoghq.com/enabled: "true"
azure.workload.identity/use: "true"
kubernetes.azure.com/cluster: redacted-cluster-name
kubernetes.azure.com/podnetwork-name: redacted-network-name
name: slot8-application-web
tags.datadoghq.com/env: slot-8
tags.datadoghq.com/service: application-web
tags.datadoghq.com/version: 2.81.0
topology.kubernetes.io/region: eastus
name: slot8-application-web
spec:
containers:
- env:
- name: ASPNETCORE_ENVIRONMENT
value: Slot8
- name: APPLICATION_ENVIRONMENT
value: SLOT
- name: APPLICATION_INSTANCEID
value: "8"
- name: APPLICATION_APPNAME
value: APPLICATION
- name: APPLICATION_APPTYPE
value: WEB
- name: ASPNETCORE_URLS
- name: DOTNET_DiagnosticPorts
value: /diag/port.sock
- name: DotnetMonitor_Storage__DefaultSharedPath
value: /diag
- name: DD_ENV
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.labels['tags.datadoghq.com/env']
- name: DD_SERVICE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.labels['tags.datadoghq.com/service']
- name: DD_VERSION
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.labels['tags.datadoghq.com/version']
- name: POD_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: POD_HOST_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: POD_SERVICE_ACCOUNT_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.serviceAccountName
- name: POD_UID
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.uid
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_CLUSTER
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.labels['kubernetes.azure.com/cluster']
- name: POD_NETWORK
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.labels['kubernetes.azure.com/podnetwork-name']
- name: POD_REGION
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.labels['topology.kubernetes.io/region']
- name: SIDECAR_MONITOR_TEMPLATE
value: slot-8-application-true
image: payquicker.azurecr.io/slot/applicationweb:custom-image
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
httpHeaders:
- name: Host
value: ""
path: /healthz
port: 443
scheme: HTTPS
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
name: slot8-application-web
ports:
- containerPort: 443
protocol: TCP
readinessProbe:
failureThreshold: 10
httpGet:
httpHeaders:
- name: Host
value: ""
path: /ready
port: 443
scheme: HTTPS
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: "2"
memory: 2560Mi
requests:
cpu: 250m
memory: 2Gi
startupProbe:
failureThreshold: 40
httpGet:
httpHeaders:
- name: Host
value: ""
path: /health/startup
port: 443
scheme: HTTPS
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 5
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /diag
name: diagvol
- args:
- collect
env:
- name: DotnetMonitor_DiagnosticPort__ConnectionMode
value: Listen
- name: DotnetMonitor_DiagnosticPort__EndpointName
value: /diag/port.sock
- name: DotnetMonitor_Urls
value: http://localhost:52323
- name: DotnetMonitor_Storage__DumpTempFolder
value: /diag/dumps
- name: DotnetMonitor_Authentication__AzureAd__ClientId
value: redacted-info
- name: DotnetMonitor_Authentication__AzureAd__RequiredRole
value: localapi
- name: DotnetMonitor_Storage__DefaultSharedPath
value: /diag
- name: DotnetMonitor_Logging__Console__FormatterName
value: simple
- name: DotnetMonitor_Egress__AzureBlobStorage__monitorBlob__blobPrefix
value: SLOT-8/APPLICATION-WEB/2.81.0
image: mcr.microsoft.com/dotnet/monitor:8.1.1
imagePullPolicy: Always
name: slot8-application-web-monitor
resources:
limits:
cpu: 250m
memory: 512Mi
requests:
cpu: 50m
memory: 32Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /diag
name: diagvol
- mountPath: /etc/dotnet-monitor
name: dotnet-monitor-config
dnsPolicy: ClusterFirst
nodeSelector:
velocity.target: "true"
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 0
runAsGroup: 0
runAsNonRoot: false
runAsUser: 0
serviceAccount: application-svcacct
serviceAccountName: application-svcacct
terminationGracePeriodSeconds: 60
volumes:
- emptyDir: {}
name: diagvol
- name: dotnet-monitor-config
projected:
defaultMode: 400
sources:
- configMap:
name: slot8-application-web-dotnet-monitor-egress
optional: false
status:
availableReplicas: 2
conditions:

  • lastTransitionTime: "2025-01-16T14:01:38Z"
    lastUpdateTime: "2025-07-10T19:25:23Z"
    message: ReplicaSet "slot8-application-web-5f7c47458c" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  • lastTransitionTime: "2025-07-11T11:25:03Z"
    lastUpdateTime: "2025-07-11T11:25:03Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
    observedGeneration: 58
    readyReplicas: 2
    replicas: 2
    updatedReplicas: 2

Above is our deployment , below is the configmap we are using

apiVersion: v1
data:
CollectionRules__UsingGcHeapSizeShortcutExample__Actions__0__Settings__Egress: monitorBlob
CollectionRules__UsingGcHeapSizeShortcutExample__Actions__0__Settings__Type: Full
CollectionRules__UsingGcHeapSizeShortcutExample__Actions__0__Type: CollectDump
CollectionRules__UsingGcHeapSizeShortcutExample__Limits__ActionCount: "1"
CollectionRules__UsingGcHeapSizeShortcutExample__Limits__ActionCountSlidingWindowDuration: "06:00:00"
CollectionRules__UsingGcHeapSizeShortcutExample__Trigger__Settings__GreaterThan: "1024"
CollectionRules__UsingGcHeapSizeShortcutExample__Trigger__Type: GCHeapSize
CollectionRules__UsingManualEventCounter__Actions__0__Settings__Egress: monitorBlob
CollectionRules__UsingManualEventCounter__Actions__0__Settings__Type: Full
CollectionRules__UsingManualEventCounter__Actions__0__Type: CollectDump
CollectionRules__UsingManualEventCounter__Limits__ActionCount: "1"
CollectionRules__UsingManualEventCounter__Limits__ActionCountSlidingWindowDuration: "06:00:00"
CollectionRules__UsingManualEventCounter__Trigger__Settings__CounterName: gc-heap-size
CollectionRules__UsingManualEventCounter__Trigger__Settings__GreaterThan: "10"
CollectionRules__UsingManualEventCounter__Trigger__Settings__ProviderName: System.Runtime
CollectionRules__UsingManualEventCounter__Trigger__Type: EventCounter
Egress__AzureBlobStorage__monitorBlob__UseWorkloadIdentityFromEnvironment: "true"
Egress__AzureBlobStorage__monitorBlob__accountUri: https://REDACTED-SA_URL
Egress__AzureBlobStorage__monitorBlob__containerName: dumps
Egress__FileSystem__monitorFile__directoryPath: /diag
Storage__DefaultSharedPath: /diag
Storage__DumpTempFolder: /diag/dumps
kind: ConfigMap

Monitor sidecar logs showing the error

  => TargetProcessId:1 TargetRuntimeInstanceCookie:044e9c48142b46a7bd871ca8414d7bae CollectionRuleName:UsingManualEventCounter => CollectionRuleTriggerType:EventCounter
  Collection rule 'UsingManualEventCounter' trigger 'EventCounter' started.

2025-07-11T18:18:30.9370121Zwarn: Microsoft.Diagnostics.Tools.Monitor.ServerEndpointInfoSource[52]
Unexpected timeout from process 1. Process will no longer be monitored.
2025-07-11T18:18:30.9373046Zinfo: Microsoft.Diagnostics.Tools.Monitor.CollectionRules.CollectionRuleService[44]
=> TargetProcessId:1 TargetRuntimeInstanceCookie:044e9c48142b46a7bd871ca8414d7bae
Stopping collection rules.
2025-07-11T18:18:30.9397543Zinfo: Microsoft.Diagnostics.Tools.Monitor.CollectionRules.CollectionRuleService[45]
=> TargetProcessId:1 TargetRuntimeInstanceCookie:044e9c48142b46a7bd871ca8414d7bae
All collection rules have stopped.
2025-07-11T18:18:30.9397777Zinfo: Microsoft.Diagnostics.Tools.Monitor.CollectionRules.CollectionRuleService[40]
=> TargetProcessId:1 TargetRuntimeInstanceCookie:044e9c48142b46a7bd871ca8414d7bae
Starting collection rules.
2025-07-11T18:18:30.9398987Zinfo: Microsoft.Diagnostics.Tools.Monitor.CollectionRules.CollectionRuleService[29]

Timeout examples with timestamps showing frequency

2025-07-10T21:22:46.9386672Zwarn: Microsoft.Diagnostics.Tools.Monitor.ServerEndpointInfoSource[52]
Unexpected timeout from process 1. Process will no longer be monitored.

2025-07-11T00:18:25.9374603Zwarn: Microsoft.Diagnostics.Tools.Monitor.ServerEndpointInfoSource[52]
Unexpected timeout from process 1. Process will no longer be monitored.

2025-07-11T04:40:25.9375142Zwarn: Microsoft.Diagnostics.Tools.Monitor.ServerEndpointInfoSource[52]
Unexpected timeout from process 1. Process will no longer be monitored.

2025-07-11T07:27:31.9370075Zwarn: Microsoft.Diagnostics.Tools.Monitor.ServerEndpointInfoSource[52]
Unexpected timeout from process 1. Process will no longer be monitored.

2025-07-11T08:04:10.9375068Zwarn: Microsoft.Diagnostics.Tools.Monitor.ServerEndpointInfoSource[52]
Unexpected timeout from process 1. Process will no longer be monitored.

2025-07-11T11:22:11.9362348Zwarn: Microsoft.Diagnostics.Tools.Monitor.ServerEndpointInfoSource[52]
Unexpected timeout from process 1. Process will no longer be monitored.

2025-07-11T11:54:40.9370739Zwarn: Microsoft.Diagnostics.Tools.Monitor.ServerEndpointInfoSource[52]
Unexpected timeout from process 1. Process will no longer be monitored.

2025-07-11T17:25:01.9366596Zwarn: Microsoft.Diagnostics.Tools.Monitor.ServerEndpointInfoSource[52]
Unexpected timeout from process 1. Process will no longer be monitored.

2025-07-11T18:18:30.9370121Zwarn: Microsoft.Diagnostics.Tools.Monitor.ServerEndpointInfoSource[52]
Unexpected timeout from process 1. Process will no longer be monitored.

  • Do you know of any workarounds?
    We can take dumps manually using scripts but they do not contain the interesting debug we are after, if we could get the collection rules working without timeouts , that would be awesome as we can use this solution to capture dumps with required debug

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions