Skip to content

Commit b066d59

Browse files
authored
Merge pull request #6 from linksplatform/issue-5-bdec6bf473d2
Fix benchmark CI false success and Transaction isolation
2 parents 5134f36 + 7304747 commit b066d59

File tree

15 files changed

+1079
-39
lines changed

15 files changed

+1079
-39
lines changed

.github/workflows/rust.yml

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,14 @@
11
name: Benchmark Rust version
22

3-
on: [push, pull_request]
3+
on:
4+
push:
5+
branches:
6+
- main
7+
- master
8+
pull_request:
9+
branches:
10+
- main
11+
- master
412

513
env:
614
toolchain: nightly-2024-01-01
@@ -57,9 +65,12 @@ jobs:
5765
NEO4J_URI: bolt://localhost:7687
5866
NEO4J_USER: neo4j
5967
NEO4J_PASSWORD: password
60-
run: cargo bench --bench bench -- --output-format bencher | tee out.txt
68+
run: |
69+
set -o pipefail
70+
cargo bench --bench bench -- --output-format bencher | tee out.txt
6171
6272
- name: Prepare benchmark results
73+
if: github.event_name == 'push' && (github.ref == 'refs/heads/main' || github.ref == 'refs/heads/master')
6374
run: |
6475
git config --global user.email "[email protected]"
6576
git config --global user.name "LinksPlatformBencher"
Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
# Case Study: Issue #5 - Benchmark is Broken
2+
3+
## Executive Summary
4+
5+
The benchmark workflow reports false success even when the benchmark fails with a panic. This case study analyzes the root causes, timeline of events, and proposes solutions.
6+
7+
**Primary Issue:** GitHub Actions workflow incorrectly reports success despite `cargo bench` failing with a panic.
8+
9+
**Root Causes Identified:**
10+
1. **Pipeline Exit Code Masking:** Using `| tee out.txt` in the benchmark command masks the exit code
11+
2. **Benchmark Logic Bug:** The delete benchmark attempts to delete links that don't exist due to shared state between benchmarks
12+
13+
## Timeline of Events
14+
15+
| Timestamp (UTC) | Event |
16+
|-----------------|-------|
17+
| 2025-12-22 19:49:32 | PR #4 merged: "Fix benchmark failure: Transaction implementation was a broken stub" |
18+
| 2025-12-22 19:49:39 | CI run #20442351221 starts |
19+
| 2025-12-22 19:55:19 | `cargo bench` begins execution |
20+
| 2025-12-22 20:18:16 | Create/Neo4j_NonTransaction completes |
21+
| 2025-12-22 20:39:35 | Create/Neo4j_Transaction completes |
22+
| 2025-12-22 20:40:26 | All Doublets Create benchmarks complete |
23+
| 2025-12-22 21:05:20 | Delete/Neo4j_NonTransaction completes (1988ms/iter) |
24+
| 2025-12-22 21:05:47 | **PANIC:** `NotExists(4000)` at `delete.rs:23` |
25+
| 2025-12-22 21:05:47 | `error: bench failed` message displayed |
26+
| 2025-12-22 21:05:56 | "Prepare benchmark results" step runs (despite failure) |
27+
| 2025-12-22 21:05:56 | Commit 5134f36: "Update benchmark results" pushed to main |
28+
| 2025-12-22 21:05:57 | CI run completes with **SUCCESS** status |
29+
30+
## Root Cause Analysis
31+
32+
### Root Cause #1: Pipeline Exit Code Masking
33+
34+
**Location:** `.github/workflows/rust.yml:60`
35+
36+
```yaml
37+
run: cargo bench --bench bench -- --output-format bencher | tee out.txt
38+
```
39+
40+
**Problem:** When a command is piped to `tee`, the exit code of the entire pipeline is determined by the last command (`tee`), not the first command (`cargo bench`). Since `tee` successfully writes output to the file, it returns exit code 0 even though `cargo bench` returned exit code 101.
41+
42+
**Evidence:**
43+
- CI logs show `error: bench failed` message
44+
- The step "Run benchmark" shows `conclusion: success` in GitHub Actions API
45+
- Subsequent steps (Prepare benchmark results) execute as if nothing failed
46+
47+
**Solution:** Use `set -o pipefail` to ensure the pipeline returns the exit code of the first failing command:
48+
49+
```yaml
50+
run: |
51+
set -o pipefail
52+
cargo bench --bench bench -- --output-format bencher | tee out.txt
53+
```
54+
55+
Or use bash-specific syntax:
56+
```yaml
57+
shell: bash
58+
run: |
59+
set -o pipefail
60+
cargo bench --bench bench -- --output-format bencher | tee out.txt
61+
```
62+
63+
### Root Cause #2: Benchmark Logic Bug (NotExists Error)
64+
65+
**Location:** `rust/benches/benchmarks/delete.rs:23-28`
66+
67+
```rust
68+
for id in (BACKGROUND_LINKS..=BACKGROUND_LINKS + 1_000).rev() {
69+
let _ = elapsed! {fork.delete(id)?};
70+
}
71+
```
72+
73+
**Problem:** The delete benchmark tries to delete links with IDs from 3000 to 4001, but the Transaction benchmark shares state with the Client benchmark. Here's the sequence:
74+
75+
1. `Delete/Neo4j_NonTransaction` runs successfully, creating links with IDs 1-4001 and deleting them
76+
2. `Delete/Neo4j_Transaction` starts - but it uses a shared `Client` that was initialized during `create` benchmarks
77+
3. The client's `next_id` counter has been modified by previous benchmarks
78+
4. When trying to delete ID 4000, it doesn't exist (was never created or was deleted)
79+
80+
**Evidence from logs:**
81+
```
82+
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: NotExists(4000)', benches/benchmarks/delete.rs:23:9
83+
```
84+
85+
**Solution Options:**
86+
1. Ensure Transaction cleanup properly resets state between benchmarks
87+
2. Use proper test isolation with separate database instances
88+
3. Handle the `NotExists` error gracefully in benchmarks
89+
90+
### Root Cause #3: Flawed Transaction Implementation
91+
92+
**Location:** `rust/src/benched.rs:94-108`
93+
94+
```rust
95+
impl<'a, T: LinkType> Benched for Exclusive<Transaction<'a, T>> {
96+
// ...
97+
fn fork(&mut self) -> Fork<Self> {
98+
Fork(self) // Does not call create_table()!
99+
}
100+
101+
unsafe fn unfork(&mut self) {
102+
// Transaction cleanup handled by client
103+
}
104+
}
105+
```
106+
107+
**Problem:** Unlike the `Client` implementation which calls `create_table()` in `fork()`, the `Transaction` implementation does not set up a clean state. This means:
108+
- Data from previous benchmarks persists
109+
- The `next_id` counter is not reset
110+
- Links created by previous tests affect the Transaction benchmark
111+
112+
**Contrast with Client implementation:**
113+
```rust
114+
impl<T: LinkType> Benched for Exclusive<Client<T>> {
115+
fn fork(&mut self) -> Fork<Self> {
116+
let _ = self.create_table(); // Proper setup
117+
Fork(self)
118+
}
119+
120+
unsafe fn unfork(&mut self) {
121+
let _ = self.drop_table(); // Proper cleanup
122+
}
123+
}
124+
```
125+
126+
## Proposed Solutions
127+
128+
### Solution 1: Fix Pipeline Exit Code (Critical)
129+
130+
**File:** `.github/workflows/rust.yml`
131+
132+
Change:
133+
```yaml
134+
- name: Run benchmark
135+
working-directory: rust
136+
env:
137+
NEO4J_URI: bolt://localhost:7687
138+
NEO4J_USER: neo4j
139+
NEO4J_PASSWORD: password
140+
run: cargo bench --bench bench -- --output-format bencher | tee out.txt
141+
```
142+
143+
To:
144+
```yaml
145+
- name: Run benchmark
146+
working-directory: rust
147+
env:
148+
NEO4J_URI: bolt://localhost:7687
149+
NEO4J_USER: neo4j
150+
NEO4J_PASSWORD: password
151+
run: |
152+
set -o pipefail
153+
cargo bench --bench bench -- --output-format bencher | tee out.txt
154+
```
155+
156+
### Solution 2: Fix Transaction Benchmark Isolation
157+
158+
**File:** `rust/src/benched.rs`
159+
160+
Update the Transaction `fork()` method to properly clean state:
161+
162+
```rust
163+
impl<'a, T: LinkType> Benched for Exclusive<Transaction<'a, T>> {
164+
// ...
165+
fn fork(&mut self) -> Fork<Self> {
166+
let _ = self.drop_table(); // Clean previous state
167+
Fork(self)
168+
}
169+
170+
unsafe fn unfork(&mut self) {
171+
let _ = self.drop_table(); // Clean after benchmark
172+
}
173+
}
174+
```
175+
176+
### Solution 3: Make Benchmarks Independent (Long-term)
177+
178+
Consider restructuring the benchmarks to:
179+
1. Each benchmark creates its own isolated database/table
180+
2. Use unique prefixes or namespaces for each benchmark run
181+
3. Add proper cleanup verification before each benchmark
182+
183+
## Impact Assessment
184+
185+
| Impact Area | Severity | Description |
186+
|-------------|----------|-------------|
187+
| CI Reliability | High | False success leads to pushing broken code |
188+
| Data Integrity | Medium | Incorrect benchmark results published to README |
189+
| Developer Trust | High | Can't rely on CI to catch benchmark failures |
190+
191+
## Files Changed in This Incident
192+
193+
1. `Docs/bench_rust.png` - Updated with partial results
194+
2. `Docs/bench_rust_log_scale.png` - Updated with partial results
195+
3. `README.md` - Results table updated with incomplete data showing N/A for failed benchmarks
196+
197+
## References
198+
199+
- [GitHub Actions: Handling Step Errors](https://www.kenmuse.com/blog/how-to-handle-step-and-job-errors-in-github-actions/)
200+
- [Bash pipefail option](https://www.gnu.org/software/bash/manual/bash.html#The-Set-Builtin)
201+
- [cargo bench documentation](https://doc.rust-lang.org/cargo/commands/cargo-bench.html)
202+
- [CI Run #20442351221](https://github.com/linksplatform/Comparisons.Neo4jVSDoublets/actions/runs/20442351221/job/58738054640)
203+
- [Commit 5134f36 (pushed with broken benchmarks)](https://github.com/linksplatform/Comparisons.Neo4jVSDoublets/commit/5134f36730caa5a6b571692e4b67b1e3cb523e3e)
204+
205+
## Appendix: Full Error Log Extract
206+
207+
```
208+
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 1450.3s, or reduce sample count to 10.
209+
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: NotExists(4000)', benches/benchmarks/delete.rs:23:9
210+
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
211+
test Delete/Neo4j_Transaction ...
212+
error: bench failed
213+
```

0 commit comments

Comments
 (0)