apollo_network_benchmark: draft#12726
apollo_network_benchmark: draft#12726sirandreww-starkware wants to merge 1 commit into02-18-apollo_network_benchmark_moved_peerid_creation_filesfrom
Conversation
|
Artifacts upload workflows: |
| [security] | ||
| disable_initial_admin_creation = true | ||
| cookie_secure = false | ||
| cookie_samesite = lax |
There was a problem hiding this comment.
Duplicate INI section causes config split
Medium Severity
The grafana.ini file has two separate [security] sections — one at line 5 setting admin_user, admin_password, allow_sign_up, and disable_gravatar, and another at line 53 setting disable_initial_admin_creation, cookie_secure, and cookie_samesite. Depending on the INI parser behavior, the first section's keys may be silently dropped or the sections may merge unpredictably. These settings belong in a single [security] block.
Additional Locations (1)
|
|
||
| delete_namespace(namespace_name, true)?; | ||
| create_namespace(namespace_name)?; | ||
| delete_namespace(namespace_name, false)?; |
There was a problem hiding this comment.
Stop function unnecessarily creates namespace before deleting
Medium Severity
The cluster_stop::run function performs a delete-create-delete sequence on the Kubernetes namespace. The create_namespace call between the two deletes is problematic: if the first delete_namespace puts the namespace in a Terminating state (common with stuck finalizers or slow resource cleanup), create_namespace will fail because Kubernetes doesn't allow creating a namespace while one with the same name is terminating. This causes the entire stop operation to error out, leaving resources uncleaned.
crates/apollo_network_benchmark/src/bin/apollo_network_benchmark_run/args.rs
Show resolved
Hide resolved
d9248de to
41712e4
Compare
5306714 to
2371fa1
Compare
crates/apollo_network_benchmark/src/bin/apollo_network_benchmark_run/cluster_start.rs
Show resolved
Hide resolved
2371fa1 to
71a3230
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
| (m(&BROADCAST_MESSAGE_THROUGHPUT), "binBps"), | ||
| (rate(&RECEIVE_MESSAGE_BYTES_SUM, "20s"), "binBps"), | ||
| (m(&RECEIVE_MESSAGE_PENDING_COUNT), "short"), | ||
| ("receive_message_negative_delay_seconds_count".into(), "short"), |
There was a problem hiding this comment.
Dashboard references metrics that don't exist in codebase
Medium Severity
The Grafana dashboard configuration references receive_message_negative_delay_seconds_count and receive_message_negative_delay_seconds_bucket in multiple panels, but these metrics are never defined in metrics.rs or registered anywhere. These dashboard panels will permanently show "No data," giving operators a misleading view of clock-skew behavior.



Note
Medium Risk
Adds substantial new orchestration code that builds Docker images, runs shell commands, and generates Kubernetes manifests, so failures could impact developer workflows and cluster resources (namespaces, jobs, port-forwards). No production runtime or security-critical logic appears to be modified beyond the benchmark tooling.
Overview
Introduces a new
apollo_network_benchmark_runorchestrator binary to spin up multi-node network stress tests either locally (Docker Compose) or on GKE (Indexed Job), including commands to start/stop deployments, fetch pod logs, and port-forward Grafana/Prometheus.Adds end-to-end observability packaging for the benchmark: generated Grafana dashboard JSON (covering network, Propeller, and Tokio runtime metrics), Prometheus scrape config generation, and checked-in Grafana/Prometheus/K8s config templates.
Adds Docker runtime assets (
Dockerfile.fast/Dockerfile.slow,entrypoint.sh) to run the node with optional traffic shaping viatc(LATENCY/THROUGHPUT) and deterministic node IDs/hostnames for cluster runs, plus new crate dependencies (anyhow,chrono,serde_json,tokio-stream,apollo_propeller, etc.).Written by Cursor Bugbot for commit 71a3230. This will update automatically on new commits. Configure here.