Skip to content

Commit bf53543

Browse files
authored
Merge branch 'master' into data-converter
2 parents 2172024 + fc64180 commit bf53543

File tree

14 files changed

+974
-0
lines changed

14 files changed

+974
-0
lines changed
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
---
2+
title: "Workflow Diagnostics"
3+
4+
date: 2025-08-06
5+
authors: sankari165
6+
tags:
7+
- announcement
8+
---
9+
10+
Cadence users, especially new users, often struggle with failed/stuck workflows and are unable to understand what is wrong with their workflow. This can now be addressed by a tool that runs on demand to check the workflow and provide diagnostics with actionable information via clear runbooks that users can follow. The overarching goal is to help cadence users understand what is wrong with their workflow
11+
12+
<!-- truncate -->
13+
14+
## Introducing Workflow Diagnostics
15+
16+
Cadence workflow diagnostics fetches the workflow execution history and identifies the issues in the workflow i.e. points out the different items that did not work as expected. For example, workflow timeouts. Next, for the issue identified, it provides the potential root cause by listing the different reasons that must've caused the issue. For example, the tasklist does not have pollers. Lastly, it provides ways to resolve the issue since we want the cadence users to have actionable diagnostics. For example, timeouts could occur when the workflow is running on a tasklist without enough workers to start the activities
17+
18+
## How it works?
19+
20+
Cadence Workflow Diagnostics will be initiated on demand by a user for a given workflow execution in a cadence domain. The call will be made to cadence-frontend service which in turn triggers a diagnostics workflow that runs in the cadence-worker service to perform the diagnostics based on workflow execution history.
21+
22+
Code references:
23+
24+
1. The [invariant interface](https://github.com/cadence-workflow/cadence/tree/master/service/worker/diagnostics/invariant) where each invariant implementation checks and root causes one specific issue like timeouts or failures.
25+
26+
2. The [diagnostics workflow](https://github.com/cadence-workflow/cadence/blob/master/service/worker/diagnostics/workflow.go) that runs as a cadence workflow where it has 2 activities: one to identify the issues using the invariant checks and other to root cause them. Some invariants might not have a rootcause implementation too.
27+
28+
3. [Parent workflow](https://github.com/cadence-workflow/cadence/blob/master/service/worker/diagnostics/parent_workflow.go) to trigger diagnostics as a child workflow followed by emission of some usage logs for observability
29+
30+
## How to use this feature?
31+
32+
1. [Frontend API](https://github.com/cadence-workflow/cadence/blob/master/service/frontend/api/interface.go#L47) or cadence CLI that triggers a call to start the diagnostics workflow - This starts the diagnostics workflow and provides the wf execution details.
33+
34+
```bash
35+
cadence --do cadence-sample-domain workflow diag --wid w123 --rid 123
36+
```
37+
38+
The above command would start performing diagnostics via a cadence workflow and return its details. Sample output:
39+
40+
```bash
41+
Workflow diagnosis started. Query the diagnostic workflow to get diagnostics report.
42+
============Diagnostic Workflow details============
43+
Domain: cadence-system, Workflow Id: diag123wid, Run Id: diag123rid
44+
```
45+
46+
Use workflow query command to fetch the results of the diagnostics
47+
48+
```bash
49+
cadence --do cadence-system workflow query --wid diag123wid --rid diag123rid --qt query-diagnostics-report
50+
```
51+
52+
2. The cadence web UI will have a diagnostics tab on the workflow execution page that displays the results of running diagnostics on the workflow. It lists the various issues identified, the potential rootcause and the link to runbooks.
53+
54+
## How to add a new use-case to workflow diagnostics?
55+
56+
1. Define an implementation of the invariant interface. [link](https://github.com/cadence-workflow/cadence/tree/master/service/worker/diagnostics/invariant/failure)
57+
58+
2. Add it to the list of invariants provided on service start up. [link](https://github.com/cadence-workflow/cadence/blob/master/cmd/server/cadence/server.go#L265)
59+
60+
3. Update the diagnostics workflow to be able to construct the diagnostics result [link](https://github.com/cadence-workflow/cadence/blob/master/service/worker/diagnostics/workflow.go#L201)
61+
62+
4. Provide a runbook for the issues/rootcause and link it up along with the diagnostics result

blog/authors.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,16 @@ jakobht:
2828
linkedin: https://www.linkedin.com/in/jakob-taankvist/
2929
github: jakobht
3030

31+
sankari165:
32+
name: Sankari Gopalakrishnan
33+
title: Senior Software Engineer @ Uber
34+
url: https://www.linkedin.com/in/sankari-gopalakrishnan165/
35+
image_url: https://github.com/sankari165.png
36+
page: true
37+
socials:
38+
linkedin: https://www.linkedin.com/in/sankari-gopalakrishnan165/
39+
github: sankari165
40+
3141
ibarrajo:
3242
name: Josué Alexander Ibarra
3343
title: Developer Advocate @ Uber
Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
---
2+
layout: default
3+
title: Grafana Helm Setup
4+
permalink: /docs/concepts/grafana-helm-setup
5+
---
6+
7+
# Grafana Helm Setup
8+
9+
<details>
10+
<summary><h2>Introduction</h2></summary>
11+
12+
This guide explains how to set up Grafana for monitoring Cadence workflows and services using Helm charts. Helm simplifies the deployment and management of Grafana in Kubernetes environments. Pre-configured dashboards for Cadence are available to visualize metrics effectively.
13+
14+
</details>
15+
16+
<details>
17+
<summary><h2>Prerequisites</h2></summary>
18+
19+
Before proceeding, ensure the following:
20+
21+
- Kubernetes cluster is up and running.
22+
- Helm is installed on your system. Refer to the [Helm installation guide](https://helm.sh/docs/intro/install/).
23+
- Access to the Cadence Helm charts repository.
24+
25+
</details>
26+
27+
<details>
28+
<summary><h2>Setup Steps</h2></summary>
29+
30+
### Step 1: Add Cadence Helm Repository
31+
32+
```bash
33+
helm repo add cadence-workflow https://cadenceworkflow.github.io/cadence-charts
34+
helm repo update
35+
```
36+
37+
### Step 2: Deploy Prometheus Operator
38+
39+
```bash
40+
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
41+
helm install prometheus-operator prometheus-community/kube-prometheus-stack \
42+
--namespace monitoring --create-namespace
43+
```
44+
45+
### Step 3: Deploy Cadence with ServiceMonitor
46+
47+
Create a `values.yaml` file to enable ServiceMonitor for automatic metrics scraping:
48+
49+
```yaml
50+
# Enable metrics collection
51+
metrics:
52+
enabled: true
53+
port: 9090
54+
portName: metrics
55+
56+
serviceMonitor:
57+
enabled: true
58+
# Replace with the namespace where Prometheus is deployed
59+
namespace: "monitoring"
60+
namespaceSelector:
61+
# Ensure this matches Prometheus's namespace
62+
matchNames:
63+
- monitoring
64+
scrapeInterval: 10s
65+
additionalLabels:
66+
# Ensure this matches Prometheus's Helm release name
67+
release: prometheus-operator
68+
annotations: {}
69+
jobLabel: "app.kubernetes.io/name"
70+
targetLabels:
71+
- app.kubernetes.io/name
72+
relabelings: []
73+
metricRelabelings: []
74+
```
75+
76+
Deploy Cadence:
77+
```bash
78+
helm install cadence cadence-workflow/cadence \
79+
--namespace cadence --create-namespace \
80+
--values values.yaml
81+
```
82+
83+
**Note:** Update the `namespace`, `matchNames`, and `release` values to match your Prometheus deployment.
84+
85+
### Step 4: Access Grafana
86+
87+
Get Grafana admin password:
88+
```bash
89+
kubectl get secret --namespace monitoring prometheus-operator-grafana \
90+
-o jsonpath="{.data.admin-password}" | base64 --decode
91+
```
92+
93+
Access Grafana:
94+
```bash
95+
kubectl port-forward --namespace monitoring svc/prometheus-operator-grafana 3000:80
96+
```
97+
98+
Open http://localhost:3000 (admin/password from above)
99+
100+
### Step 5: Import Cadence Dashboards
101+
102+
1. **Download the Cadence Grafana Dashboard JSON:**
103+
```bash
104+
curl https://raw.githubusercontent.com/cadence-workflow/cadence/refs/heads/master/docker/grafana/provisioning/dashboards/cadence-server.json -o cadence-server.json
105+
```
106+
107+
2. **Import in Grafana:** **Dashboards****Import** → Upload JSON files
108+
3. **Select Prometheus** as data source when prompted
109+
4. Try the same steps for other dashboards
110+
111+
</details>
112+
113+
<details>
114+
<summary><h2>Customization</h2></summary>
115+
116+
The Grafana dashboards can be customized by editing the JSON files or modifying panels directly in Grafana. Additionally, Helm values can be overridden during installation to customize Grafana settings.
117+
118+
### Example: Override Helm Values
119+
Create a `values.yaml` file to customize Grafana settings:
120+
```yaml
121+
grafana:
122+
adminPassword: "your-password"
123+
dashboards:
124+
enabled: true
125+
```
126+
127+
Install Grafana with the custom values:
128+
```bash
129+
helm install grafana cadence/grafana -n cadence-monitoring -f values.yaml
130+
```
131+
132+
</details>
133+
134+
<details>
135+
<summary><h2>Additional Information</h2></summary>
136+
137+
- [Cadence Helm Charts Repository](https://github.com/cadence-workflow/cadence-charts)
138+
- [Grafana Documentation](https://grafana.com/docs/)
139+
- [Helm Documentation](https://helm.sh/docs/)
140+
141+
</details>

docs/05-go-client/21-sleep.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
---
2+
layout: default
3+
title: Sleep
4+
permalink: /docs/go-client/sleep
5+
---
6+
7+
# Workflow Sleep
8+
9+
The `workflow.Sleep` function allows a Cadence workflow to pause its execution for a specified duration. This is similar to `time.Sleep` in Go, but is safe and deterministic for use within Cadence workflows. The workflow will be paused and resumed by the Cadence service, and the sleep is durable—meaning the workflow can survive worker restarts or failures during the sleep period.
10+
11+
## Example: Sleep for 30 Seconds
12+
13+
Here is a minimal example of using `workflow.Sleep` in a Cadence workflow, as demonstrated in [cadence-samples PR #99](https://github.com/cadence-workflow/cadence-samples/pull/99):
14+
15+
```go
16+
import (
17+
"time"
18+
"go.uber.org/cadence/workflow"
19+
)
20+
21+
func SleepWorkflow(ctx workflow.Context) error {
22+
workflow.GetLogger(ctx).Info("Workflow started, going to sleep for 30 seconds...")
23+
err := workflow.Sleep(ctx, 30*time.Second)
24+
if err != nil {
25+
workflow.GetLogger(ctx).Error("Sleep interrupted", "Error", err)
26+
return err
27+
}
28+
workflow.GetLogger(ctx).Info("Woke up after 30 seconds!")
29+
return nil
30+
}
31+
```
32+
33+
### Key Points
34+
- Use `workflow.Sleep(ctx, duration)` instead of `time.Sleep` inside workflow code.
35+
- The sleep is durable: if the worker crashes or restarts, the workflow will resume sleeping where it left off.
36+
- The workflow is not consuming worker resources while sleeping; the state is persisted by Cadence.
37+
- You can use any duration supported by Go's `time.Duration`.
38+
39+
### When to Use
40+
- Delaying workflow progress for a fixed period (e.g., retry with backoff, scheduled reminders, timeouts).
41+
- Waiting for an external event or timeout before proceeding.
42+
43+
### Limitations
44+
- Do not use `time.Sleep` in workflow code; always use `workflow.Sleep` for determinism and durability.
45+
- Very large numbers of simultaneous timers (sleeps) may impact cluster performance; consider jittering or batching if needed.
46+
47+
For more details and advanced usage, see the [Cadence Go client documentation](https://pkg.go.dev/go.uber.org/cadence/workflow#Sleep).

0 commit comments

Comments
 (0)