Skip to content

Conversation

@stuqdog
Copy link
Member

@stuqdog stuqdog commented Jun 2, 2025

No description provided.

}
if (!connected) {
// NOLINTNEXTLINE
close();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lia-viam here's somewhere that your C++ expertise would be greatly appreciated! close seemed better than just exiting to ensure graceful shutdown (though we do ultimately want to exit if we've reached this point), but I'm not totally confident that calling close here, from within this thread, is safe behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue calling close from here is that joinable will return false because it checks thread.get_id() == this_thread::id(). we should probably be storing false in the should_check_connection_ thing if we establish that we've been disconnected, so that even if we're not calling join we can be sure that the thread will have exited

another thought would be to have a flag variable that says if we've been disconnected, and have impl::client_helper check that the same way it always checks if impl is non-null. That might exit with an exception throw, but I'm not sure if we're too pressed about "graceful" shutdown in the highly ungraceful case of the process being orphaned by a kill -9

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also maybe cc @acmorrow

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably be storing false in the should_check_connection_ thing if we establish that we've been disconnected, so that even if we're not calling join we can be sure that the thread will have exited

I'm not sure I entirely understand this, but I'm a little concerned about it. We'd need to at some point reset the value to true (or else we stop checking connection after the first failure), but I could imagine a case where someone triggers a close (which sets should_check_connection_ to false) and then immediately after, this loop resets it to true and so we just continue this loop forever and fail to shutdown properly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, the python SDK at this point just calls sys.exit(), so if we're not too concerned about gracefulness (and this is a case where the module is an orphan process anyway and nothing it logs or does will be trivially inspectable), we could just exit(0) and call it a day.

Copy link
Collaborator

@lia-viam lia-viam Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per offline discussion--this is actually fine bc when the check_connection thread calls close it will not join itself, but the later destructor call will. for transparency it may be good to add break; after close(), but also close() sets the while loop flag variable to false so the function will return immediately after close() returns

should_check_connection_.store(false);
}
bool connected(true);
while (should_check_connection_) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lia-viam another ignorant C++ question, should_check_connection_ is set to false when we want to shutdown (makes sense!) but also above when check_every and reconnect_every are set to zero. In such a case this thread will return pretty much immediately, but the client code in such a case could be running for an indefinite amount of time. Are there any safety concerns here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless i'm missing something that should be fine--the thread would return and we would just have the same behavior as pre-this PR

@stuqdog stuqdog marked this pull request as ready for review June 2, 2025 19:44
@stuqdog stuqdog requested a review from a team as a code owner June 2, 2025 19:44
@stuqdog stuqdog requested review from lia-viam and njooma and removed request for a team June 2, 2025 19:44
Comment on lines 138 to 148
/// @brief How often to verify connectivity to the robot, in seconds. If set to 0, will not
/// check, will default to the `reconnect_every_interval_` value. Defaults to 0.
/// @note Setting to a non-zero value is useful in modules but may result in delays shutting
/// down client code
unsigned int check_every_interval_ = 0;

/// @brief How often to attempt to reconnect to the robot when disconnected. If set to 0,
/// will not attempt to reconnect. Defaults to 0.
/// @note Setting to a non-zero value is useful in modules but may result in delays shutting
/// down client code
unsigned int reconnect_every_interval_ = 0;
Copy link
Member Author

@stuqdog stuqdog Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted to set these both to 0 by default. It leads to no meaningful behavioral change for existing clients, and sidesteps the issue of having to wait for the check_connection loop to finish before shutting down client code. Also if a client is having issues connecting, they probably can figure out what they want to do and stop the process as necessary by interacting directly, without us making any assumptions.

We could put these into constructors, but I chose not to because I didn't want any breaking changes or constructor bloat (what if a user only wants to set one?), and at any rate maintaining the existing constructor to avoid breaking existing code means that we still have to be opinionated about a default value.

the default is overridden for modules in modules/service.cpp, to ensure that modules don't get orphaned.

/// check, will default to the `reconnect_every_interval_` value. Defaults to 0.
/// @note Setting to a non-zero value is useful in modules but may result in delays shutting
/// down client code
unsigned int check_every_interval_ = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are new let's change them to std::chrono::seconds which is what I had done to refresh_interval_ on the dial direct branch

const char* ViamChannel::get_channel_addr() const {
return uri_;
}
Options& Options::set_check_every_interval(unsigned int interval) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with the other comment let's have these be std::chrono::seconds

should_check_connection_.store(false);
}
bool connected(true);
while (should_check_connection_) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless i'm missing something that should be fine--the thread would return and we would just have the same behavior as pre-this PR

}
bool connected(true);
while (should_check_connection_) {
std::exception connection_error;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll want to do this with std::exception_ptr otherwise the polymorphic exception object could get sliced when it's copied into std::exception

auto impl =
std::make_unique<RobotClient::impl>(RobotService::NewStub(channel.channel()));
impl_.reset();
impl_.swap(impl);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can all just be impl_ = std::make_unique<...>(...)

}
if (!connected) {
// NOLINTNEXTLINE
close();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue calling close from here is that joinable will return false because it checks thread.get_id() == this_thread::id(). we should probably be storing false in the should_check_connection_ thing if we establish that we've been disconnected, so that even if we're not calling join we can be sure that the thread will have exited

another thought would be to have a flag variable that says if we've been disconnected, and have impl::client_helper check that the same way it always checks if impl is non-null. That might exit with an exception throw, but I'm not sure if we're too pressed about "graceful" shutdown in the highly ungraceful case of the process being orphaned by a kill -9

}
if (!connected) {
// NOLINTNEXTLINE
close();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also maybe cc @acmorrow

@stuqdog stuqdog merged commit 7f03834 into viamrobotics:main Jun 3, 2025
4 checks passed
@stuqdog stuqdog deleted the RSDK-10842-shut-down-cpp-modules-when-zombied branch June 6, 2025 15:22
@atlassian
Copy link

atlassian bot commented Sep 22, 2025

🔗 Link your GitHub account to Atlassian

To enable Code Reviewer, please link your GitHub account to your Atlassian account.

Click here to connect your accounts

This is a one-time setup that takes less than a minute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants