Skip to content

add retry to postToSlack#25

Open
msk-nkhr wants to merge 3 commits intoszyn:masterfrom
msk-nkhr:add-retry-to-postToSlack
Open

add retry to postToSlack#25
msk-nkhr wants to merge 3 commits intoszyn:masterfrom
msk-nkhr:add-retry-to-postToSlack

Conversation

@msk-nkhr
Copy link
Copy Markdown

@msk-nkhr msk-nkhr commented Nov 21, 2023

issue

In rare cases, 429 or 503 errors occur and posting to Slack is failed, but the slack>: task succeeds.
Error messages as below.

java.io.IOException: status: 429, message: {"retry_after":1,"ok":false,"error":"rate_limited"}
java.io.IOException: status: 503, message:
...
(omitted)
...
503 Service Unavailable
 
The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.

I want to retry when responce code is retryable.
Moreover I want to make slack task failed if posting to Slack is failed.

modification

how to retry

According to the official okhttp document , Interceptors are explained as bellow.

Interceptors are a powerful mechanism that can monitor, rewrite, and retry calls.

Application interceptors
Permitted to short-circuit and not call Chain.proceed().

I created Interceptor class for retrying, and registered it by calling addInterceptor() on OkHttpClient.Builder.

test error

commit: 0ba2663
testing post to slack(circle ci): https://app.circleci.com/pipelines/github/szyn/digdag-slack/43/workflows/9c070197-d48b-4a37-a3be-7e8c37da8d5c/jobs/92

I checked to failed slack>: task and retrying post to slack when posting to slack was failed.
Also, you can control a setting of retry by using _errorparameter.

Log
2024-02-22 10:39:35 +0000 [INFO] (0015@[0:default:1:1]+success+task): slack>: templates/valid.yml
Feb 22, 2024 10:39:36 AM io.digdag.plugin.slack.SlackOperatorFactory$RetryInterceptor intercept
INFO: Retry count: 1
Feb 22, 2024 10:39:38 AM io.digdag.plugin.slack.SlackOperatorFactory$RetryInterceptor intercept
INFO: Retry count: 2
Feb 22, 2024 10:39:42 AM io.digdag.plugin.slack.SlackOperatorFactory$RetryInterceptor intercept
INFO: Retry count: 3
Feb 22, 2024 10:39:50 AM io.digdag.plugin.slack.SlackOperatorFactory$RetryInterceptor intercept
INFO: Retry count: 4
Feb 22, 2024 10:40:06 AM io.digdag.plugin.slack.SlackOperatorFactory$RetryInterceptor intercept
INFO: Retry count: 5
2024-02-22 10:40:38 +0000 [ERROR] (0015@[0:default:1:1]+success+task): Task failed, retrying
io.digdag.spi.TaskExecutionException: java.lang.RuntimeException: Failed to send to Slack. status: 503, message: {}
        at io.digdag.spi.TaskExecutionException.ofNextPollingWithCause(TaskExecutionException.java:85)
        at io.digdag.util.BaseOperator.run(BaseOperator.java:56)
        at io.digdag.core.agent.OperatorManager.callExecutor(OperatorManager.java:399)
        at io.digdag.cli.Run$OperatorManagerWithSkip.callExecutor(Run.java:709)
        at io.digdag.core.agent.OperatorManager.runWithWorkspace(OperatorManager.java:308)
        at io.digdag.core.agent.OperatorManager.lambda$runWithHeartbeat$2(OperatorManager.java:152)
        at io.digdag.core.agent.LocalWorkspaceManager.withExtractedArchive(LocalWorkspaceManager.java:25)
        at io.digdag.core.agent.OperatorManager.runWithHeartbeat(OperatorManager.java:150)
        at io.digdag.core.agent.OperatorManager.run(OperatorManager.java:133)
        at io.digdag.cli.Run$OperatorManagerWithSkip.run(Run.java:691)
        at io.digdag.core.agent.MultiThreadAgent.lambda$run$0(MultiThreadAgent.java:132)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: Failed to send to Slack. status: 503, message: {}
        at io.digdag.plugin.slack.SlackOperatorFactory$SlackOperator.postToSlack(SlackOperatorFactory.java:102)
        at io.digdag.plugin.slack.SlackOperatorFactory$SlackOperator.runTask(SlackOperatorFactory.java:77)
        at io.digdag.util.BaseOperator.run(BaseOperator.java:35)
        ... 14 common frames omitted
Caused by: java.io.IOException: status: 503, message: {}
        at io.digdag.plugin.slack.SlackOperatorFactory$SlackOperator.postToSlack(SlackOperatorFactory.java:98)
        ... 16 common frames omitted
Feb 22, 2024 10:40:40 AM io.digdag.plugin.slack.SlackOperatorFactory$RetryInterceptor intercept
INFO: Retry count: 1
Feb 22, 2024 10:40:42 AM io.digdag.plugin.slack.SlackOperatorFactory$RetryInterceptor intercept
INFO: Retry count: 2
Feb 22, 2024 10:40:46 AM io.digdag.plugin.slack.SlackOperatorFactory$RetryInterceptor intercept
INFO: Retry count: 3
Feb 22, 2024 10:40:54 AM io.digdag.plugin.slack.SlackOperatorFactory$RetryInterceptor intercept
INFO: Retry count: 4
Feb 22, 2024 10:41:10 AM io.digdag.plugin.slack.SlackOperatorFactory$RetryInterceptor intercept
INFO: Retry count: 5
2024-02-22 10:41:42 +0000 [ERROR] (0015@[0:default:1:1]+success+task): Task failed, retrying
io.digdag.spi.TaskExecutionException: java.lang.RuntimeException: Failed to send to Slack. status: 503, message: {}
        at io.digdag.spi.TaskExecutionException.ofNextPollingWithCause(TaskExecutionException.java:85)
        at io.digdag.util.BaseOperator.run(BaseOperator.java:56)
        at io.digdag.core.agent.OperatorManager.callExecutor(OperatorManager.java:399)
        at io.digdag.cli.Run$OperatorManagerWithSkip.callExecutor(Run.java:709)
        at io.digdag.core.agent.OperatorManager.runWithWorkspace(OperatorManager.java:308)
        at io.digdag.core.agent.OperatorManager.lambda$runWithHeartbeat$2(OperatorManager.java:152)
        at io.digdag.core.agent.LocalWorkspaceManager.withExtractedArchive(LocalWorkspaceManager.java:25)
        at io.digdag.core.agent.OperatorManager.runWithHeartbeat(OperatorManager.java:150)
        at io.digdag.core.agent.OperatorManager.run(OperatorManager.java:133)
        at io.digdag.cli.Run$OperatorManagerWithSkip.run(Run.java:691)
        at io.digdag.core.agent.MultiThreadAgent.lambda$run$0(MultiThreadAgent.java:132)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: Failed to send to Slack. status: 503, message: {}
        at io.digdag.plugin.slack.SlackOperatorFactory$SlackOperator.postToSlack(SlackOperatorFactory.java:102)
        at io.digdag.plugin.slack.SlackOperatorFactory$SlackOperator.runTask(SlackOperatorFactory.java:77)
        at io.digdag.util.BaseOperator.run(BaseOperator.java:35)
        ... 14 common frames omitted
Caused by: java.io.IOException: status: 503, message: {}
        at io.digdag.plugin.slack.SlackOperatorFactory$SlackOperator.postToSlack(SlackOperatorFactory.java:98)
        ... 16 common frames omitted
Feb 22, 2024 10:41:45 AM io.digdag.plugin.slack.SlackOperatorFactory$RetryInterceptor intercept
INFO: Retry count: 1
Feb 22, 2024 10:41:47 AM io.digdag.plugin.slack.SlackOperatorFactory$RetryInterceptor intercept
INFO: Retry count: 2
Feb 22, 2024 10:41:51 AM io.digdag.plugin.slack.SlackOperatorFactory$RetryInterceptor intercept
INFO: Retry count: 3
Feb 22, 2024 10:41:59 AM io.digdag.plugin.slack.SlackOperatorFactory$RetryInterceptor intercept
INFO: Retry count: 4
Feb 22, 2024 10:42:15 AM io.digdag.plugin.slack.SlackOperatorFactory$RetryInterceptor intercept
INFO: Retry count: 5
2024-02-22 10:42:47 +0000 [ERROR] (0015@[0:default:1:1]+success+task): Task failed, retrying
io.digdag.spi.TaskExecutionException: java.lang.RuntimeException: Failed to send to Slack. status: 503, message: {}
        at io.digdag.spi.TaskExecutionException.ofNextPollingWithCause(TaskExecutionException.java:85)
        at io.digdag.util.BaseOperator.run(BaseOperator.java:56)
        at io.digdag.core.agent.OperatorManager.callExecutor(OperatorManager.java:399)
        at io.digdag.cli.Run$OperatorManagerWithSkip.callExecutor(Run.java:709)
        at io.digdag.core.agent.OperatorManager.runWithWorkspace(OperatorManager.java:308)
        at io.digdag.core.agent.OperatorManager.lambda$runWithHeartbeat$2(OperatorManager.java:152)
        at io.digdag.core.agent.LocalWorkspaceManager.withExtractedArchive(LocalWorkspaceManager.java:25)
        at io.digdag.core.agent.OperatorManager.runWithHeartbeat(OperatorManager.java:150)
        at io.digdag.core.agent.OperatorManager.run(OperatorManager.java:133)
        at io.digdag.cli.Run$OperatorManagerWithSkip.run(Run.java:691)
        at io.digdag.core.agent.MultiThreadAgent.lambda$run$0(MultiThreadAgent.java:132)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: Failed to send to Slack. status: 503, message: {}
        at io.digdag.plugin.slack.SlackOperatorFactory$SlackOperator.postToSlack(SlackOperatorFactory.java:102)
        at io.digdag.plugin.slack.SlackOperatorFactory$SlackOperator.runTask(SlackOperatorFactory.java:77)
        at io.digdag.util.BaseOperator.run(BaseOperator.java:35)
        ... 14 common frames omitted
Caused by: java.io.IOException: status: 503, message: {}
        at io.digdag.plugin.slack.SlackOperatorFactory$SlackOperator.postToSlack(SlackOperatorFactory.java:98)
        ... 16 common frames omitted
Feb 22, 2024 10:42:54 AM io.digdag.plugin.slack.SlackOperatorFactory$RetryInterceptor intercept
INFO: Retry count: 1
Feb 22, 2024 10:42:56 AM io.digdag.plugin.slack.SlackOperatorFactory$RetryInterceptor intercept
INFO: Retry count: 2
Feb 22, 2024 10:43:00 AM io.digdag.plugin.slack.SlackOperatorFactory$RetryInterceptor intercept
INFO: Retry count: 3
Feb 22, 2024 10:43:08 AM io.digdag.plugin.slack.SlackOperatorFactory$RetryInterceptor intercept
INFO: Retry count: 4
Feb 22, 2024 10:43:24 AM io.digdag.plugin.slack.SlackOperatorFactory$RetryInterceptor intercept
INFO: Retry count: 5
2024-02-22 10:43:56 +0000 [ERROR] (0015@[0:default:1:1]+success+task): Task failed with unexpected error: Failed to send to Slack. status: 503, message: {}
java.lang.RuntimeException: Failed to send to Slack. status: 503, message: {}
        at io.digdag.plugin.slack.SlackOperatorFactory$SlackOperator.postToSlack(SlackOperatorFactory.java:102)
        at io.digdag.plugin.slack.SlackOperatorFactory$SlackOperator.runTask(SlackOperatorFactory.java:77)
        at io.digdag.util.BaseOperator.run(BaseOperator.java:35)
        at io.digdag.core.agent.OperatorManager.callExecutor(OperatorManager.java:399)
        at io.digdag.cli.Run$OperatorManagerWithSkip.callExecutor(Run.java:709)
        at io.digdag.core.agent.OperatorManager.runWithWorkspace(OperatorManager.java:308)
        at io.digdag.core.agent.OperatorManager.lambda$runWithHeartbeat$2(OperatorManager.java:152)
        at io.digdag.core.agent.LocalWorkspaceManager.withExtractedArchive(LocalWorkspaceManager.java:25)
        at io.digdag.core.agent.OperatorManager.runWithHeartbeat(OperatorManager.java:150)
        at io.digdag.core.agent.OperatorManager.run(OperatorManager.java:133)
        at io.digdag.cli.Run$OperatorManagerWithSkip.run(Run.java:691)
        at io.digdag.core.agent.MultiThreadAgent.lambda$run$0(MultiThreadAgent.java:132)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: status: 503, message: {}
        at io.digdag.plugin.slack.SlackOperatorFactory$SlackOperator.postToSlack(SlackOperatorFactory.java:98)
        ... 16 common frames omitted
2024-02-22 10:43:57 +0000 [INFO] (0015@[0:default:1:1]+success^failure-alert): type: notify
error: 
  * +success+task:
    Failed to send to Slack. status: 503, message: {} (runtime)

other

cicle ci

openjdk:8u141-jdk was added Let's Encrypt certificates ( https://www.java.com/download/help/release_changes.html ).
However, that CA had already expired, so an error occurs as below ( https://letsencrypt.org/docs/dst-root-ca-x3-expiration-september-2021/)

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

I modified to use the latest cimg/open-jdk8 instead of openjdk:8u141-jdk.

@msk-nkhr msk-nkhr force-pushed the add-retry-to-postToSlack branch 7 times, most recently from 8c74536 to c43f3bd Compare February 20, 2024 12:01
working_directory: ~/digdag-slack
docker:
- image: openjdk:8u141-jdk
- image: cimg/openjdk:8.0.402
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openjdk:8u141-jdk was added Let's Encrypt certificates ( https://www.java.com/download/help/release_changes.html ).
However, that CA had already expired, so an error occurs as below ( https://letsencrypt.org/docs/dst-root-ca-x3-expiration-september-2021/)

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

I modified to use the latest cimg/open-jdk8 instead of openjdk:8u141-jdk.

@msk-nkhr msk-nkhr force-pushed the add-retry-to-postToSlack branch 5 times, most recently from 6643a04 to bc03ffc Compare February 20, 2024 15:21
@msk-nkhr msk-nkhr force-pushed the add-retry-to-postToSlack branch from 3adc448 to d3d4354 Compare February 22, 2024 10:50
@msk-nkhr msk-nkhr marked this pull request as ready for review February 27, 2024 05:47
@msk-nkhr
Copy link
Copy Markdown
Author

@szyn
I would like to request review 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant