Skip to content

Conversation

@zimeg
Copy link
Member

@zimeg zimeg commented Jun 19, 2025

Changelog

We fixed a bug where the run command exited with an error if the activity logs failed to stream to now attempt retries for those missed logs.

Summary

This PR avoids exiting on an error when streaming the activity logs for the run command to workaround or fix #128! 👾

Preview

The following video shows a disconnected and reconnected internet connection and a healing socket connection:

  • 0:07: The app has started and connected
  • 0:19: The internet connection is turned off
  • 0:52: A new web socket attempts to connect
  • 1:15: Activity logs begin to fail retries - New
  • 1:37: The internet connection is turned on
  • 1:40: A single socket mode connection is created with streaming logs
disconnect.mov

Notes

  • This change doesn't fix the cause of persisting processes with a cancelled context or exit without error, but we instead avoid both to require interrupts to or process exits from the app!
  • Following the note above, the "Closing due to inactivity. Au revoir!" might need to include an error instead. Perhaps we save this or canceling the context on exits for a follow up PR? 🤖

Requirements

@zimeg zimeg added this to the Next Release milestone Jun 19, 2025
@zimeg zimeg self-assigned this Jun 19, 2025
@zimeg zimeg requested a review from a team as a code owner June 19, 2025 06:12
@zimeg zimeg added bug M-T: confirmed bug report. Issues are confirmed when the reproduction steps are documented changelog Use on updates to be included in the release notes semver:patch Use on pull requests to describe the release version increment labels Jun 19, 2025
@codecov
Copy link

codecov bot commented Jun 19, 2025

Codecov Report

Attention: Patch coverage is 88.88889% with 1 line in your changes missing coverage. Please review.

Project coverage is 63.57%. Comparing base (4d342d7) to head (583bea7).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/api/activity.go 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #132      +/-   ##
==========================================
+ Coverage   63.48%   63.57%   +0.09%     
==========================================
  Files         212      212              
  Lines       22345    22348       +3     
==========================================
+ Hits        14185    14208      +23     
+ Misses       7078     7057      -21     
- Partials     1082     1083       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zimeg zimeg mentioned this pull request Jun 19, 2025
3 tasks
Copy link
Member Author

@zimeg zimeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 A few of the thoughts happening with some of these changes for the most kind reviewers!

b, err := c.get(ctx, url, token, "")
if err != nil {
return ActivityResult{}, errHTTPRequestFailed.WithRootCause(err)
return ActivityResult{}, slackerror.New(slackerror.ErrHTTPRequestFailed).WithRootCause(err)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📣 We create a new error here to avoid appending different repeated root causes to this same error:

HTTP request failed (http_request_failed)
   Get "https://slack.com/api/apps.activities.list?app_id=A0923PDU39B&limit=100&min_log_level=info&min_date_created=1750312064888018": dial tcp: lookup slack.com: no such host

   Get "https://slack.com/api/apps.activities.list?app_id=A0923PDU39B&limit=100&min_log_level=info&min_date_created=1750312064888018": dial tcp: lookup slack.com: no such host

   Get "https://slack.com/api/apps.activities.list?app_id=A0923PDU39B&limit=100&min_log_level=info&min_date_created=1750312064888018": dial tcp: lookup slack.com: no such host

   Get "https://slack.com/api/apps.activities.list?app_id=A0923PDU39B&limit=100&min_log_level=info&min_date_created=1750312064888018": dial tcp: lookup slack.com: no such host

Comment on lines +94 to 97
if err != nil {
return err
}
return nil
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Small preference to make the nil return more clear instead of returning err in both cases, even if it too is nil.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I prefer this clarify as well!

cm.API.AssertNumberOfCalls(t, "Activity", 1)
},
},
"should return nil if TailArg is set and activity request fails while polling": {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧪 The added nil values for ExpectedError above are included to make these cases more explicit instead of preferring default behavior.

For this test I think it's useful, but I'm of course open to reverting this!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to revert, I like this as well!

@zimeg zimeg changed the title fix: avoid exiting on error when streaming activity logs fix: avoid exiting on disconnected api errors when streaming activity logs Jun 19, 2025
@mwbrooks mwbrooks changed the title fix: avoid exiting on disconnected api errors when streaming activity logs fix: 'run' exiting on disconnected api errors when streaming activity logs Jun 23, 2025
Copy link
Member

@mwbrooks mwbrooks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Nice improvement @zimeg!

🧪 During testing, I noticed that a few times I'd receive a connection reset by peer error when re-connecting the wifi network. However, it doesn't always happen.

image

🎥 Below is a video for anyone who wants to see the original error that this PR fixes:

2025-06-23-run-disconnect.mov

Comment on lines +94 to 97
if err != nil {
return err
}
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I prefer this clarify as well!

cm.API.AssertNumberOfCalls(t, "Activity", 1)
},
},
"should return nil if TailArg is set and activity request fails while polling": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to revert, I like this as well!

@zimeg
Copy link
Member Author

zimeg commented Jun 25, 2025

@mwbrooks Thanks so much for testing this! 🎁

I noticed that a few times I'd receive a connection reset by peer error when re-connecting the wifi network. However, it doesn't always happen.

Woah this might be happening for deno apps with the CLI-managed server in place of delegated connection? 👾

The retries of @slack/socket-mode have been persistent in my testing but we should explore that error soon:

r.clients.IO.PrintDebug(ctx, "LocalServer.Start closing websocket TCP connection")

The before video was also forgotten in these hacks - thank you for sharing this too 📸

I'll merge this now to find out if I can keep a connection ongoing these next multiple hours and will report back 🫡

@zimeg zimeg merged commit 0faf469 into main Jun 25, 2025
6 checks passed
@zimeg zimeg deleted the zimeg-fix-activity-tail-disconnect branch June 25, 2025 01:35
@zimeg
Copy link
Member Author

zimeg commented Jun 25, 2025

🔍 A CLI error has since appeared for me after running the app for a while and repeats every few seconds but without exiting which I think is expected:

[2025-06-24 18:39:00] Setting up a ticker to poll activity on a 3s interval
...
[INFO]  bolt-app ⚡️ Bolt app is running!
[2025-06-24 22:10:18] HTTP Request: GET https://slack.com/api/apps.activities.list?app_id=A093KR40YEL&limit=100&min_log_level=info&min_date_created=1750815539904263 HTTP/1.1
[2025-06-24 22:10:18] HTTP Request Body:
[2025-06-24 22:10:18] <no body>
[2025-06-24 22:10:18] HTTP Response Status: HTTP/1.1 200 OK
[2025-06-24 22:10:18] HTTP Response Body:
[2025-06-24 22:10:18] <no body>
[2025-06-24 22:10:18] The following error was returned by the apps.activities.list Slack API method
[2025-06-24 22:10:18]
[2025-06-24 22:10:18] 🚫 Your access token has expired (token_expired)
[2025-06-24 22:10:18]
[2025-06-24 22:10:18] 💡 Suggestion
[2025-06-24 22:10:18]    Use the command `lack login` to authenticate again
...
[2025-06-24 22:14:18] HTTP Request: GET https://slack.com/api/apps.activities.list?app_id=A093KR40YEL&limit=100&min_log_level=info&min_date_created=1750815539904263 HTTP/1.1
[2025-06-24 22:14:18] HTTP Request Body:
[2025-06-24 22:14:18] <no body>
[2025-06-24 22:14:18] HTTP Response Status: HTTP/1.1 200 OK
[2025-06-24 22:14:18] HTTP Response Body:
[2025-06-24 22:14:18] <no body>
[2025-06-24 22:14:18] The following error was returned by the apps.activities.list Slack API method
[2025-06-24 22:14:18]
[2025-06-24 22:14:18] 🚫 Your access token has expired (token_expired)
[2025-06-24 22:14:18]
[2025-06-24 22:14:18] 💡 Suggestion
[2025-06-24 22:14:18]    Use the command `lack login` to authenticate again
[WARN]  socket-mode:SlackWebSocket:1 A pong wasn't received from the server before the timeout of 5000ms!
[WARN]  socket-mode:SlackWebSocket:1 A ping wasn't received from the server before the timeout of 30000ms!
[DEBUG]  socket-mode:SlackWebSocket:1 Sending close frame (status=1000).
[DEBUG]  socket-mode:SlackWebSocket:1 Sending close frame (status=1000).
[DEBUG]  socket-mode:SlackWebSocket:1 WebSocket close frame received (code: 1006, reason: )
[DEBUG]  socket-mode:SlackWebSocket:1 Terminating WebSocket (close frame received).
[DEBUG]  socket-mode:SocketModeClient:0 Before trying to reconnect, this client will wait for 5000 milliseconds

I'm wondering if an expired token for apps.activities.list isn't being rotated or used in new requests? 👾

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug M-T: confirmed bug report. Issues are confirmed when the reproduction steps are documented changelog Use on updates to be included in the release notes semver:patch Use on pull requests to describe the release version increment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: unexpected daemon process

3 participants