Skip to content

Conversation

@zawadzkidiana
Copy link
Contributor

@zawadzkidiana zawadzkidiana commented Oct 23, 2025

Problem:
Customer reported cadence-activity-task-failed metric not emitted when activities fail due to heartbeat timeout. Root cause is context.Canceled errors convert to RespondActivityTaskCanceledRequest, which only emits ActivityTaskCanceledCounter.

Solution:
Modified reportActivityComplete() and reportActivityCompleteByID() to emit both ActivityTaskCanceledCounter and ActivityTaskFailedCounter when activities are canceled. Cancellations from timeouts are failures from a monitoring perspective.

Changes:

  • internal/internal_task_pollers.go: Added ActivityTaskFailedCounter.Inc(1) in canceled case for both functions

Testing:

  • go build ./internal/... - compiles successfully
  • go test ./internal - all existing tests pass
  • go test ./internal/common/metrics - metrics infrastructure verified
  • Verified ActivityTaskFailedCounter already defined and used for explicit failures (line 71 constants.go, line 1296 internal_task_pollers.go)

Backward compatible. No breaking changes to server communication.

What changed?

Why?

How did you test it?

Potential risks

Problem:
Customer reported cadence-activity-task-failed metric not emitted when
activities fail due to heartbeat timeout. Root cause is context.Canceled
errors convert to RespondActivityTaskCanceledRequest, which only emits
ActivityTaskCanceledCounter.

Solution:
Modified reportActivityComplete() and reportActivityCompleteByID() to emit
both ActivityTaskCanceledCounter and ActivityTaskFailedCounter when
activities are canceled. Cancellations from timeouts are failures from a
monitoring perspective.

Changes:
- internal/internal_task_pollers.go: Added ActivityTaskFailedCounter.Inc(1)
  in canceled case for both functions

Testing:
- go build ./internal/... - compiles successfully
- go test ./internal - all existing tests pass
- go test ./internal/common/metrics - metrics infrastructure verified
- Verified ActivityTaskFailedCounter already defined and used for explicit
  failures (line 71 constants.go, line 1296 internal_task_pollers.go)

Backward compatible. No breaking changes to server communication.

Signed-off-by: Diana Zawadzki <[email protected]>
Signed-off-by: Diana Zawadzki <[email protected]>
@zawadzkidiana zawadzkidiana force-pushed the draft/dzawadzki/investigate-activity-metric-emission branch from 764033d to 3e501d8 Compare October 24, 2025 20:14
@codecov
Copy link

codecov bot commented Oct 24, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.76%. Comparing base (f596630) to head (9ec0bc3).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Coverage Δ
internal/internal_task_pollers.go 87.16% <100.00%> (-0.47%) ⬇️

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f596630...9ec0bc3. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zawadzkidiana zawadzkidiana changed the title Emit ActivityTaskFailedCounter for canceled activities fix: emit ActivityTaskFailedCounter for canceled activities Oct 30, 2025
@zawadzkidiana
Copy link
Contributor Author

Added "fix:" to title to fix Semantic Pull Request Error

@davidporter-id-au davidporter-id-au merged commit 70ea3b3 into cadence-workflow:master Oct 31, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants