Test that Rust and C++ ipc_bridge example work with each other#67
Test that Rust and C++ ipc_bridge example work with each other#67saksham-chawla wants to merge 5 commits intocodex/replay-20260319023913-pr-39-basefrom
Conversation
Using `--help` lead definitely to a crash, which should not happen, because `--help` is normal behavior. Also `main()` was missing a return statement which is ok in C/C++, but we have no control of the return value in this case. Now it is always 0, when terminating without errors.
CI should check that C++ and Rust code can communicate via mw::com to avoid having surprise regressions.
There is no Windows specific devcontainer anymore and all can use the same devcontainer.
There was a problem hiding this comment.
Reliability issues in the new test script need fixes for deterministic binary selection and cleanup, plus minor logging/comment tweaks.
Status: Changes Suggested | Risk: Medium | Readiness: 5/10 (Needs Polish)
Issues Identified & Suggestions
- Harden test script binary discovery; avoid globbed find and add guard.
test_ipc_bridge.sh - Ensure sender cleanup under set -e via trap and set +e.
test_ipc_bridge.sh - Delay temp log deletion until after sender checks.
test_ipc_bridge.sh - Correct retry interval log and SIGTERM comment.
ipc_bridge.rs,test_ipc_bridge.sh
Review Details
📁 7 files reviewed | 💬 7 comments
👍 / 👎 individual comments to help improve reviews for you
|
|
||
| # this cannot work if both sender and receiver expect 20 samples | ||
| # the sender has to send infinite samples and the receiver has to terminate after having received some | ||
| CPP_EXAMPLE_CMD="$(find * -name ipc_bridge_cpp) -s $MANIFEST_LOCATION --cycle-time 10" |
There was a problem hiding this comment.
[Reliability] find * is unquoted and glob-expanded before find runs, so if the working directory contains no matching entries or multiple matches, CPP_EXAMPLE_CMD / RUST_EXAMPLE_CMD will either be empty (causing a cryptic error later) or contain multiple space-separated paths (running the wrong binary). With set -e the script would fail, but the error message would be unhelpful.
Add a guard after the find:
CPP_BINARY="$(find . -name ipc_bridge_cpp -type f | head -1)"
if [[ -z "$CPP_BINARY" ]]; then
echo "ERROR: ipc_bridge_cpp binary not found" >&2
exit 1
fi
CPP_EXAMPLE_CMD="$CPP_BINARY -s $MANIFEST_LOCATION --cycle-time 10"Context for Agents
`find *` is unquoted and glob-expanded before `find` runs, so if the working directory contains no matching entries or multiple matches, `CPP_EXAMPLE_CMD` / `RUST_EXAMPLE_CMD` will either be empty (causing a cryptic error later) or contain multiple space-separated paths (running the wrong binary). With `set -e` the script would fail, but the error message would be unhelpful.
Add a guard after the `find`:
```bash
CPP_BINARY="$(find . -name ipc_bridge_cpp -type f | head -1)"
if [[ -z "$CPP_BINARY" ]]; then
echo "ERROR: ipc_bridge_cpp binary not found" >&2
exit 1
fi
CPP_EXAMPLE_CMD="$CPP_BINARY -s $MANIFEST_LOCATION --cycle-time 10"
```
File: score/mw/com/example/ipc_bridge/test_ipc_bridge.sh
Line: 8Chain of Thought
- Validation Decision: accepted
- Confidence: 0.82
- Likelihood Score: 3/5
- Likelihood Reason: The scenario is plausible: if the binary isn't built, the test is run from the wrong directory, or the build system places artifacts differently,
findwould return empty output. Sincefindreturns exit code 0 even when no files match,set -ewon't catch this — the script would fail later with a cryptic error when trying to execute an invalid command string. - Suggested Action: keep
- Tool Calls Used: 2
Reasoning
The comment identifies a real issue in newly introduced code (this is an entirely new file).
Technical analysis of find *:
- The
*glob is expanded by bash beforefindruns — this is correct. - If
findmatches nothing, the command substitution$(find * -name ipc_bridge_cpp)returns an empty string. Sincefindreturns exit code 0 even when no files match,set -edoes NOT catch this. CPP_EXAMPLE_CMDwould become-s /tmp/mw_com_lola_service_manifest.json --cycle-time 10, and when used unquoted on lines 110/112 ($EXAMPLE_CMD_SEND --num-cycles 0 ...), it would try to run-sas a command, producing a cryptic error.- If
findreturns multiple paths, they'd be space-concatenated and again produce unexpected behavior.
The comment correctly identifies that set -e would catch the eventual failure but the error message would be unhelpful. The suggested fix — switching to find . with a -z guard — is actionable, concrete, and would make test failures much easier to diagnose.
This is entirely new code introduced by this PR, so it's squarely within diff scope.
Evidence Summary
1. File Snapshot (HEAD)
- File:
score/mw/com/example/ipc_bridge/test_ipc_bridge.sh
1: #!/bin/bash
2: set -euxo pipefail
3:
4: MANIFEST_LOCATION="/tmp/mw_com_lola_service_manifest.json"
5:
6: # this cannot work if both sender and receiver expect 20 samples
7: # the sender has to send infinite samples and the receiver has to terminate after having received some
8: CPP_EXAMPLE_CMD="$(find * -name ipc_bridge_cpp) -s $MANIFEST_LOCATION --cycle-time 10"
9: RUST_EXAMPLE_CMD="$(find * -name ipc_bridge_rs) -s $MANIFEST_LOCATION --cycle-time 10"
10:
...
|
|
||
| # this cannot work if both sender and receiver expect 20 samples | ||
| # the sender has to send infinite samples and the receiver has to terminate after having received some | ||
| CPP_EXAMPLE_CMD="$(find * -name ipc_bridge_cpp) -s $MANIFEST_LOCATION --cycle-time 10" |
There was a problem hiding this comment.
[Reliability] Using find * -name ipc_bridge_cpp is non-deterministic in Bazel runfiles: if more than one ipc_bridge_cpp exists (e.g., binary + runfiles copy), the variable will contain multiple paths and the command will fail or pass an extra argument. Use a deterministic runfiles path (e.g., $TEST_SRCDIR/$TEST_WORKSPACE/score/mw/com/example/ipc_bridge/ipc_bridge_cpp) or restrict find to a single executable (-type f -perm -111 | head -n1).
Context for Agents
Using `find * -name ipc_bridge_cpp` is non-deterministic in Bazel runfiles: if more than one `ipc_bridge_cpp` exists (e.g., binary + runfiles copy), the variable will contain multiple paths and the command will fail or pass an extra argument. Use a deterministic runfiles path (e.g., `$TEST_SRCDIR/$TEST_WORKSPACE/score/mw/com/example/ipc_bridge/ipc_bridge_cpp`) or restrict `find` to a single executable (`-type f -perm -111 | head -n1`).
File: score/mw/com/example/ipc_bridge/test_ipc_bridge.sh
Line: 8Chain of Thought
- Validation Decision: accepted
- Confidence: 0.82
- Likelihood Score: 3/5
- Likelihood Reason: In typical Bazel setups, there's usually one copy of the binary in runfiles, so
findoften returns one result. However, platform-specific runfiles layouts, symlink trees vs manifest-based runfiles, or future changes adding similarly-named targets could cause multiple matches. The issue is plausible, especially as the codebase evolves. - Suggested Action: keep
- Tool Calls Used: 7
Reasoning
The comment identifies a real fragility in newly introduced code (this PR adds the entire file). The find * -name ipc_bridge_cpp pattern in a Bazel sh_test is non-idiomatic and fragile because:
- Non-determinism:
findcould return multiple paths if the runfiles tree contains duplicates (symlinks, copies, or different configurations). The command substitution would then produce a variable with multiple space-separated paths. - Downstream failure mode: On line 110,
$EXAMPLE_CMD_SEND --num-cycles 0 --mode sendwould try to execute the first path with the second path as an argument, causing confusing failures. - Standard practice: Bazel provides
$TEST_SRCDIRand$TEST_WORKSPACEenvironment variables specifically for locating runfiles deterministically insh_testrules. The BUILD file at line 100-107 shows the test is declared assh_testwith specificdatadependencies at known paths.
The comment is actionable with two concrete suggestions:
- Use
$TEST_SRCDIR/$TEST_WORKSPACE/score/mw/com/example/ipc_bridge/ipc_bridge_cpp - Or restrict
findwith-type f -perm -111 | head -n1
This is newly introduced code (the file didn't exist before this PR), so this is not legacy debt.
Evidence Summary
1. File Snapshot (HEAD)
- File:
score/mw/com/example/ipc_bridge/BUILD
1: # *******************************************************************************
2: # Copyright (c) 2025 Contributors to the Eclipse Foundation
3: #
4: # See the NOTICE file(s) distributed with this work for additional
5: # information regarding copyright ownership.
6: #
7: # This program and the accompanying materials are made available under the
8: # terms of the Apache License Version 2.0 which is available at
9: # https://www.apache.org/licenses/LICENSE-2.0
10: #
...
2. File Lookup
- Filename:
sh_test - Max Results:
5
No files found matching 'sh_test'
3. File Lookup
- Filename:
test_ipc_bridge.sh
No files found matching 'test_ipc_bridge.sh'
| cleanup_lola | ||
|
|
||
| # Run examples | ||
| $EXAMPLE_CMD_SEND --num-cycles 0 --mode send > "$tempdir/send.log" 2>&1 & |
There was a problem hiding this comment.
[Reliability] The sender processes started here run with --num-cycles 0, so they never terminate on their own. Because the script runs with set -e, any failure before the kill $sender_pid block (receiver crashes, grep misses the expected text, etc.) aborts the script immediately without ever killing the background sender. The orphaned sender keeps publishing indefinitely, polluting /dev/shm and can cause the test (or subsequent invocations) to hang until Bazel times out.
Install a trap right after the background process is launched so that the sender is always terminated, even when the function exits early, e.g.
$EXAMPLE_CMD_SEND --num-cycles 0 --mode send >"$tempdir/send.log" 2>&1 &
sender_pid=$!
trap 'kill ${sender_pid} 2>/dev/null; rm -f '"$MANIFEST_LOCATION" EXIT
...
kill ${sender_pid}
wait ${sender_pid} || true
trap "rm -f $MANIFEST_LOCATION" EXITNote: a new trap ... EXIT replaces the previous one (set at line 97 for manifest cleanup), so both the sender kill and the manifest removal should be combined in the trap handler to avoid losing the manifest cleanup.
Context for Agents
The sender processes started here run with `--num-cycles 0`, so they never terminate on their own. Because the script runs with `set -e`, any failure before the `kill $sender_pid` block (receiver crashes, `grep` misses the expected text, etc.) aborts the script immediately without ever killing the background sender. The orphaned sender keeps publishing indefinitely, polluting `/dev/shm` and can cause the test (or subsequent invocations) to hang until Bazel times out.
Install a trap right after the background process is launched so that the sender is always terminated, even when the function exits early, e.g.
```bash
$EXAMPLE_CMD_SEND --num-cycles 0 --mode send >"$tempdir/send.log" 2>&1 &
sender_pid=$!
trap 'kill ${sender_pid} 2>/dev/null; rm -f '"$MANIFEST_LOCATION" EXIT
...
kill ${sender_pid}
wait ${sender_pid} || true
trap "rm -f $MANIFEST_LOCATION" EXIT
```
Note: a new `trap ... EXIT` replaces the previous one (set at line 97 for manifest cleanup), so both the sender kill and the manifest removal should be combined in the trap handler to avoid losing the manifest cleanup.
File: score/mw/com/example/ipc_bridge/test_ipc_bridge.sh
Line: 110Chain of Thought
- Validation Decision: accepted
- Confidence: 0.95
- Likelihood Score: 4/5
- Likelihood Reason: Any test failure (receiver crash, grep mismatch) between starting the sender and the
killcommand will leave an orphaned infinite sender process. Given this is a test with multiple failure modes (IPC communication between Rust/C++, pattern matching on logs), failures are reasonably likely during development and CI, making orphaned processes a practical concern. - Suggested Action: modify
- Tool Calls Used: 2
Reasoning
The comment accurately identifies a real resource leak bug in newly introduced code:
set -eis active (line 2), so any command failure exits the script immediately.- The sender runs forever with
--num-cycles 0in background (line 110). - Multiple commands can fail between starting the sender (line 110) and killing it (line 120): the receiver command (line 112), two
grepcommands (lines 115-116). Any failure exits without reachingkill $sender_pid. - Bash does not kill background processes on exit - they get reparented to init and continue running indefinitely.
- This is entirely new code introduced by this PR.
- The suggested fix is concrete and correct: using a
trap ... EXITimmediately after starting the background process ensures cleanup regardless of how the function exits.
One minor note: the existing trap at line 97 (trap "rm -f $MANIFEST_LOCATION" EXIT) would be overwritten by the suggested new trap. A production-quality fix might combine both cleanups. But the core issue and solution direction are sound.
The comment is actionable, accurate, and addresses a real issue introduced by this PR.
Evidence Summary
1. File Snapshot (HEAD)
- File:
score/mw/com/example/ipc_bridge/test_ipc_bridge.sh
1: #!/bin/bash
2: set -euxo pipefail
3:
4: MANIFEST_LOCATION="/tmp/mw_com_lola_service_manifest.json"
5:
6: # this cannot work if both sender and receiver expect 20 samples
7: # the sender has to send infinite samples and the receiver has to terminate after having received some
8: CPP_EXAMPLE_CMD="$(find * -name ipc_bridge_cpp) -s $MANIFEST_LOCATION --cycle-time 10"
9: RUST_EXAMPLE_CMD="$(find * -name ipc_bridge_rs) -s $MANIFEST_LOCATION --cycle-time 10"
10:
...
| rm -rf "$tempdir" | ||
|
|
||
| # Kill sender and check its return code | ||
| kill $sender_pid |
There was a problem hiding this comment.
[Reliability] kill $sender_pid executes while set -e is still active (it was set on line 2 and is not disabled until set +e on line 122). If the sender process has already exited (e.g., it crashed), kill returns a non-zero exit code, causing the script to abort immediately under set -e — skipping the wait, the return-code check, and the final cleanup_lola. This leaves shared-memory artifacts behind and produces a misleading test failure.
The set +e on line 122 should be moved before the kill on line 120:
set +e
kill $sender_pid
wait $sender_pid
...
Context for Agents
[**Bug**] `kill $sender_pid` executes while `set -e` is still active (it was set on line 2 and is not disabled until `set +e` on line 122). If the sender process has already exited (e.g., it crashed), `kill` returns a non-zero exit code, causing the script to abort immediately under `set -e` — skipping the `wait`, the return-code check, and the final `cleanup_lola`. This leaves shared-memory artifacts behind and produces a misleading test failure.
The `set +e` on line 122 should be moved **before** the `kill` on line 120:
```
set +e
kill $sender_pid
wait $sender_pid
...
```
File: score/mw/com/example/ipc_bridge/test_ipc_bridge.sh
Line: 120Chain of Thought
- Validation Decision: accepted
- Confidence: 0.95
- Likelihood Score: 3/5
- Likelihood Reason: The sender runs with infinite cycles, so under normal conditions it's still alive when
killis reached. However, if the sender crashes (a plausible scenario in IPC testing with shared memory),killfails andset -eaborts the script, skipping cleanup. This is a real robustness issue for a test script. - Suggested Action: keep
- Tool Calls Used: 2
Reasoning
The comment correctly identifies a real bug in the newly added test script:
set -euxo pipefailon line 2 enables exit-on-error for the entire script.kill $sender_pidon line 120 is executed whileset -eis still active.set +eon line 122 comes AFTER thekill, not before.
If the sender process has already exited (crashed, was killed by the OS, etc.), kill returns non-zero, and set -e causes the script to terminate immediately. This skips:
wait $sender_pid(line 123)- Return code checking (lines 124-126)
cleanup_lola(line 129) — leaving shared memory artifacts
The fix is concrete, minimal, and correct: move set +e to before the kill command.
This is introduced by this PR (the entire file is new). The comment is accurate, actionable, and identifies a genuine robustness issue.
Evidence Summary
1. File Snapshot (HEAD)
- File:
score/mw/com/example/ipc_bridge/test_ipc_bridge.sh - Lines:
1-130
1: #!/bin/bash
2: set -euxo pipefail
3:
4: MANIFEST_LOCATION="/tmp/mw_com_lola_service_manifest.json"
5:
6: # this cannot work if both sender and receiver expect 20 samples
7: # the sender has to send infinite samples and the receiver has to terminate after having received some
8: CPP_EXAMPLE_CMD="$(find * -name ipc_bridge_cpp) -s $MANIFEST_LOCATION --cycle-time 10"
9: RUST_EXAMPLE_CMD="$(find * -name ipc_bridge_rs) -s $MANIFEST_LOCATION --cycle-time 10"
10:
...
| # Check if the receiver received the message | ||
| grep -q "Subscrib" "$tempdir/recv.log" | ||
| grep -q "Received sample" "$tempdir/recv.log" | ||
| rm -rf "$tempdir" |
There was a problem hiding this comment.
[Maintainability] The temporary directory containing logs is removed before all checks on the sender process are complete. If the script fails on the wait or the exit code check (lines 123-126), the logs that could help debug the failure will have already been deleted. This rm command should be moved to the end of the function to ensure logs are preserved until all checks pass.
Context for Agents
The temporary directory containing logs is removed before all checks on the sender process are complete. If the script fails on the `wait` or the exit code check (lines 123-126), the logs that could help debug the failure will have already been deleted. This `rm` command should be moved to the end of the function to ensure logs are preserved until all checks pass.
File: score/mw/com/example/ipc_bridge/test_ipc_bridge.sh
Line: 117Chain of Thought
- Validation Decision: accepted
- Confidence: 0.95
- Likelihood Score: 4/5
- Likelihood Reason: When the sender fails with an unexpected exit code (not 143), developers will have no logs to debug. This is a test script specifically designed to catch IPC interop issues between Rust and C++, so failures during development are expected. The missing logs will actively hinder debugging.
- Suggested Action: keep
- Tool Calls Used: 2
Reasoning
The comment is accurate and actionable:
-
Factual accuracy: Line 117 (
rm -rf "$tempdir") deletes the temp directory containingsend.logandrecv.logbefore the sender's exit code verification on lines 120-126. This is plainly visible in the code. -
Real impact: The
send.logfile is the primary artifact for debugging sender-side failures. Ifkill $sender_pidfails (line 120), orwaitreturns an unexpected code, or the exit code check[[ "143" == "$sender_return_code" ]]fails (line 126), the script aborts (due toset -e) and both log files are already deleted. -
Introduced by this PR: The file is entirely new (confirmed not present in base), so this is not legacy debt.
-
Clear fix: Move
rm -rf "$tempdir"to after line 126 (or to the end of the function at line 129). This preserves logs for debugging while still cleaning up on success. -
Actionable: The comment specifies exactly what to move and where, with a concrete rationale.
Evidence Summary
1. File Snapshot (HEAD)
- File:
score/mw/com/example/ipc_bridge/test_ipc_bridge.sh - Lines:
100-130
100: function run_receiver_sender() {
101: EXAMPLE_CMD_RECV="$1"
102: EXAMPLE_CMD_SEND="$2"
103:
104: tempdir=$(mktemp -d /tmp/ipc_bridge.XXXXXX)
105:
106: # Ensure we start with a clean state
107: cleanup_lola
108:
109: # Run examples
...
| break handles; | ||
| } else { | ||
| println!("No service found, retrying in 1 second"); | ||
| println!("No service found, retrying in 0.5 seconds"); |
There was a problem hiding this comment.
[Documentation] The log message states that the retry will happen in "0.5 seconds", but the actual sleep duration defined by SERVICE_DISCOVERY_SLEEP_DURATION is 10 milliseconds. This discrepancy can be misleading during debugging. The log message should be updated to reflect the correct retry interval.
| println!("No service found, retrying in 0.5 seconds"); | |
| println!("No service found, retrying in {}ms", SERVICE_DISCOVERY_SLEEP_DURATION.as_millis()); |
Context for Agents
The log message states that the retry will happen in "0.5 seconds", but the actual sleep duration defined by `SERVICE_DISCOVERY_SLEEP_DURATION` is 10 milliseconds. This discrepancy can be misleading during debugging. The log message should be updated to reflect the correct retry interval.
```suggestion
println!("No service found, retrying in {}ms", SERVICE_DISCOVERY_SLEEP_DURATION.as_millis());
```
File: score/mw/com/example/ipc_bridge/ipc_bridge.rs
Line: 97Chain of Thought
- Validation Decision: accepted
- Confidence: 0.98
- Likelihood Score: 5/5
- Likelihood Reason: The log message will inevitably print "0.5 seconds" every time the retry loop fires, while the actual sleep is 10ms. This will mislead anyone debugging service discovery timing.
- Suggested Action: keep
- Tool Calls Used: 0
Reasoning
The comment correctly identifies a discrepancy introduced by this PR:
- The PR changes
SERVICE_DISCOVERY_SLEEP_DURATIONfromDuration::from_secs(1)(line 46 BASE) toDuration::from_millis(10)(line 60 HEAD) — that's 10 milliseconds. - The PR changes the log message from "retrying in 1 second" (line 79 BASE) to "retrying in 0.5 seconds" (line 97 HEAD) — that's 500 milliseconds.
- 10ms ≠ 500ms. The log message is wrong by a factor of 50x.
Both the constant and the log message are newly introduced/modified in this PR, so this is not pre-existing debt. The suggestion to use SERVICE_DISCOVERY_SLEEP_DURATION.as_millis() in the format string is a good, concrete fix that keeps the message in sync with the actual value.
| set -e | ||
| [[ "143" == "$sender_return_code" ]] | ||
|
|
||
| # Cleanup due to SIGINT |
There was a problem hiding this comment.
[Documentation] Misleading comment: kill (line 120) sends SIGTERM (signal 15, exit code 143), not SIGINT (signal 2, exit code 130). The comment should say "Cleanup due to SIGTERM".
| # Cleanup due to SIGINT | |
| # Cleanup due to SIGTERM |
Context for Agents
Misleading comment: `kill` (line 120) sends `SIGTERM` (signal 15, exit code 143), not `SIGINT` (signal 2, exit code 130). The comment should say "Cleanup due to SIGTERM".
```suggestion
# Cleanup due to SIGTERM
```
File: score/mw/com/example/ipc_bridge/test_ipc_bridge.sh
Line: 128Chain of Thought
- Validation Decision: accepted
- Confidence: 0.98
- Likelihood Score: 5/5
- Likelihood Reason: The comment on line 128 is definitively wrong.
killwithout a signal argument sends SIGTERM (signal 15, exit code 128+15=143), not SIGINT (signal 2, exit code 128+2=130). The code itself already confirms this: line 126 checks for exit code 143, which corresponds to SIGTERM. The misleading comment will confuse anyone reading this script. - Suggested Action: keep
- Tool Calls Used: 1
Reasoning
The review comment is factually correct:
- Line 120:
kill $sender_pid— thekillcommand without a-signalargument sends SIGTERM (signal 15) by default. - Line 126:
[[ "143" == "$sender_return_code" ]]— 143 = 128 + 15, which is the exit code for a process terminated by SIGTERM. This confirms the script author understood the signal is SIGTERM. - Line 128:
# Cleanup due to SIGINT— SIGINT is signal 2 (exit code 130 = 128 + 2). This comment is incorrect; it should say SIGTERM.
The code and the return code check are both consistent with SIGTERM, but the comment incorrectly says SIGINT. This is newly introduced code (the entire file is new in this PR), so the issue is introduced by this PR. The fix is trivial and the comment is actionable with a concrete suggestion.
Evidence Summary
1. File Snapshot (HEAD)
- File:
score/mw/com/example/ipc_bridge/test_ipc_bridge.sh - Lines:
118-130
118:
119: # Kill sender and check its return code
120: kill $sender_pid
121:
122: set +e
123: wait $sender_pid
124: sender_return_code=$?
125: set -e
126: [[ "143" == "$sender_return_code" ]]
127:
...
CI should check that C++ and Rust code can communicate via mw::com. To simplify the tests the CLI interface of the Rust code has been more aligned with the C++ interface.
Fixes #38
The examples were also adjusted to make cross-language testing more robust (including support for long-running send cycles and consistent exit codes), and a new shell script now orchestrates these Rust/C++ IPC tests as part of the CI pipeline.
Architecture Diagram:
flowchart TD subgraph Test_Layer A["test_ipc_bridge.sh (+144 lines)"] end subgraph Application_Layer B["ipc_bridge_rs (+25/-16 lines)"] C["ipc_bridge_cpp (+1 line)"] end subgraph Middleware_Layer D["sample_sender_receiver.cpp (+1/-1 lines)"] E["mw::com (Shared Memory)"] end A -->|runs| B A -->|runs| C B -->|uses| E C -->|uses| E C -->|implements| DThis summary was automatically generated by @propel-code-bot