Skip to content

Commit 97ff102

Browse files
committed
crimson/osd: fix Watch::connect() behaviour on reconnect.
It's perfectly legal for a client to reconnect to particular `Watch` using different socket / `Connection` than original one. This shall include proper handling of the watch timer which is currently broken as, when reconnecting, we don't cancel the timer. This leaded to the following crash at Sepia: ``` rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-09-02_07:44:51-rados-master-distro-basic-smithi/6372357$ less ./remote/smithi183/log/ceph-osd.4.log.gz ... DEBUG 2021-09-02 08:10:45,462 [shard 0] osd - client_request(id=12, detail=m=[osd_op(client.5087.0:93 7.1e 7:7c7084bd:::repobj:head {watch reconnect cookie 94478891024832 gen 1} snapc 0={} ondisk+write+know n_if_redirected e40) v8]): got obc lock ... DEBUG 2021-09-02 08:10:45,462 [shard 0] osd - do_op_watch INFO 2021-09-02 08:10:45,462 [shard 0] osd - found existing watch by client.5087 DEBUG 2021-09-02 08:10:45,462 [shard 0] osd - do_op_watch_subop_watch INFO 2021-09-02 08:10:45,462 [shard 0] osd - found existing watch watch(cookie 94478891024832 30s 172.21.15.150:0/3544196211) by client.5087 ... INFO 2021-09-02 08:10:45,462 [shard 0] osd - op_effect: found existing watcher: 94478891024832,client.5087 ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-7406-g9d30203c/rpm/el8/BUILD/ceph- 17.0.0-7406-g9d30203c/src/seastar/include/seastar/core/timer.hh:95: void seastar::timer<Clock>::arm_state(seastar::timer<Clock>::time_point, std::optional<typename Clock::duration>) [with Clock = seastar::l owres_clock; seastar::timer<Clock>::time_point = std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long int, std::ratio<1, 1000> > >; typename Clock::duration = std::chrono::duration<long int, std::ratio<1, 1000> >]: Assertion `!_armed' failed. Aborting on shard 0. Backtrace: 0# 0x000055CC052CF0B6 in ceph-osd 1# FatalSignal::signaled(int, siginfo_t const&) in ceph-osd 2# FatalSignal::install_oneshot_signal_handler<6>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd 3# 0x00007FA58349FB20 in /lib64/libpthread.so.0 4# gsignal in /lib64/libc.so.6 5# abort in /lib64/libc.so.6 6# 0x00007FA581A98C89 in /lib64/libc.so.6 7# 0x00007FA581AA6A76 in /lib64/libc.so.6 8# 0x000055CC0BEEE9DD in ceph-osd 9# crimson::osd::Watch::connect(seastar::shared_ptr<crimson::net::Connection>, bool) in ceph-osd 10# 0x000055CC00B1D246 in ceph-osd 11# 0x000055CBFFEF01AE in ceph-osd ... ``` Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
1 parent f51c0c7 commit 97ff102

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

src/crimson/osd/watch.cc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,9 +81,9 @@ seastar::future<> Watch::connect(crimson::net::ConnectionRef conn, bool)
8181
{
8282
if (this->conn == conn) {
8383
logger().debug("conn={} already connected", conn);
84-
timeout_timer.cancel();
84+
return seastar::now();
8585
}
86-
86+
timeout_timer.cancel();
8787
timeout_timer.arm(std::chrono::seconds{winfo.timeout_seconds});
8888
this->conn = std::move(conn);
8989
return seastar::now();

0 commit comments

Comments
 (0)