-
Notifications
You must be signed in to change notification settings - Fork 9
feat(test): Add MariaDB docker tasks; Add storage-related tests in GitHub workflows. #267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds MariaDB-backed test infrastructure: dynamic MariaDB provisioning (start/stop), DB init scripts, env-driven storage URL resolution in tests and helpers, CI/workflow and taskfile wiring to start MariaDB before tests, linting scope update, and removal of the old SQL init script. Changes
Sequence Diagram(s)sequenceDiagram
participant TaskRunner as Task (taskfiles/test.yaml)
participant StartScript as start-mariadb
participant Docker as Docker Engine
participant InitDB as init_db.py
participant Tests as Test Runner
TaskRunner->>StartScript: invoke start-mariadb with container name, port, creds
StartScript->>Docker: docker pull/run container (expose hostPort->3306)
StartScript->>Docker: docker exec healthcheck.sh (poll until ready)
StartScript-->>TaskRunner: return success and set SPIDER_STORAGE_URL
TaskRunner->>InitDB: invoke init_db.py with DB connection params
InitDB-->>TaskRunner: return success after creating schema
TaskRunner->>Tests: run tests with SPIDER_STORAGE_URL env
TaskRunner->>StopScript: defer stop-mariadb cleanup
StopScript->>Docker: docker stop/remove container
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @tools/scripts/storage/init_db.py:
- Around line 15-231: The CREATE TABLE statements incorrectly use ON UPDATE
CURRENT_TIMESTAMP for columns that should be immutable after creation; remove
the ON UPDATE CURRENT_TIMESTAMP clause from `creation_time` in the jobs table
(jobs -> `creation_time`), from `start_time` in the task_instances table
(task_instances -> `start_time`), and from `lease_time` in the scheduler_leases
table (scheduler_leases -> `lease_time`) while keeping DEFAULT CURRENT_TIMESTAMP
so the timestamp is set at insertion; leave the `heartbeat` column (drivers ->
`heartbeat`) as-is with ON UPDATE CURRENT_TIMESTAMP.
In @tools/scripts/storage/start.py:
- Line 12: The _MARIADB_IMAGE constant is pinned to "mariadb:latest", which is
non-reproducible; change _MARIADB_IMAGE to a specific, supported MariaDB tag
(e.g., "mariadb:10.11" or another project-approved version) or make it
configurable (read from an environment variable or script argument with a stable
default) so builds are deterministic; update any documentation or defaults to
reflect the chosen pinned version.
In @tools/scripts/storage/stop.py:
- Line 25: The argparse parser description contains a typo: change the string
passed to the ArgumentParser constructor (the description argument used when
creating parser) from "Stop MairaDB Docker container." to "Stop MariaDB Docker
container." so the CLI help shows the correct database name; update the
description in the parser instantiation where parser =
argparse.ArgumentParser(...) is defined.
- Around line 44-48: The variable name localstack_stop_cmd is misleading because
it targets a MariaDB container; rename it to mariadb_stop_cmd wherever defined
and referenced (the list currently built with "docker", "stop", args.name) so
the variable name matches its purpose—update the declaration and any subsequent
use of localstack_stop_cmd to mariadb_stop_cmd in the stop.py script.
🧹 Nitpick comments (5)
.github/workflows/tests.yaml (1)
1-1: Consider updating the workflow name to reflect broader test scope.The workflow name "unit-tests" no longer accurately reflects its scope, as it now runs integration tests (line 60). Consider renaming to "tests" or "all-tests" for clarity.
📝 Suggested rename
-name: "unit-tests" +name: "tests"tools/scripts/storage/get_free_port.py (1)
24-25: Remove the unused noqa directive.The static analysis tool correctly identifies that the
noqa: T201directive is unnecessary because the T201 rule (print statement detection) is not enabled in your Ruff configuration.♻️ Proposed fix
- # ruff: noqa: T201 print(port)tools/scripts/storage/init_db.py (3)
97-97: Consider rationalizing VARBINARY and VARCHAR size limits.The value size limit of 999 bytes appears in multiple columns (
data.value,task_outputs.type,task_outputs.value,task_inputs.type,task_inputs.value,client_kv_data.value,task_kv_data.value). This specific limit seems arbitrary.Common choices are powers of 2 (1024, 2048) or clean decimal values (1000). Consider whether 999 was chosen for a specific reason or if it should be normalized.
Also applies to: 108-109, 123-123, 214-215, 224-224
271-284: Add error handling and success logging.The script lacks error handling and logging for the database initialization process. If table creation fails, the error won't be clearly communicated to the user.
♻️ Proposed improvement
+ logger.info("Connecting to MariaDB at 127.0.0.1:%d...", args.port) with ( mariadb.connect( host="127.0.0.1", port=args.port, user=args.username, password=args.password, database=args.database, ) as conn, conn.cursor() as cursor, ): + logger.info("Creating database tables...") for table_creator in _TABLE_CREATORS: cursor.execute(table_creator) conn.commit() + logger.info("Database initialization completed successfully.") return 0
273-273: Hardcoded host limits flexibility.The host is hardcoded to
"127.0.0.1". If MariaDB needs to run on a different host (e.g., Docker host networking scenarios), this won't be configurable.Consider adding a
--hostargument with"127.0.0.1"as the default, similar to the other connection parameters.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
.github/workflows/tests.yamlpython/spider-py/tests/client/test_driver.pypython/spider-py/tests/storage/test_mariadb.pytaskfiles/lint.yamltaskfiles/test.yamltests/integration/client.pytests/integration/test_client.pytests/integration/test_scheduler_worker.pytests/integration/test_signal.pytests/storage/StorageTestHelper.hpptools/scripts/storage/get_free_port.pytools/scripts/storage/init_db.pytools/scripts/storage/init_db.sqltools/scripts/storage/start.pytools/scripts/storage/stop.pytools/scripts/storage/wait_for_db.py
💤 Files with no reviewable changes (1)
- tools/scripts/storage/init_db.sql
🧰 Additional context used
🧬 Code graph analysis (5)
tools/scripts/storage/start.py (2)
tools/scripts/storage/init_db.py (1)
main(242-285)tools/scripts/storage/wait_for_db.py (1)
main(21-65)
tools/scripts/storage/wait_for_db.py (2)
tools/scripts/storage/init_db.py (1)
main(242-285)tools/scripts/storage/start.py (1)
main(22-114)
python/spider-py/tests/client/test_driver.py (1)
python/spider-py/src/spider_py/client/driver.py (1)
Driver(19-60)
python/spider-py/tests/storage/test_mariadb.py (1)
python/spider-py/src/spider_py/storage/jdbc_url.py (1)
parse_jdbc_url(22-55)
tests/integration/test_client.py (1)
tests/integration/client.py (2)
get_storage_url(116-121)storage(125-134)
🪛 Ruff (0.14.10)
tools/scripts/storage/start.py
61-61: subprocess call: check for execution of untrusted input
(S603)
73-73: subprocess call: check for execution of untrusted input
(S603)
104-104: subprocess call: check for execution of untrusted input
(S603)
tools/scripts/storage/wait_for_db.py
45-45: subprocess call: check for execution of untrusted input
(S603)
tools/scripts/storage/stop.py
34-34: subprocess call: check for execution of untrusted input
(S603)
50-50: subprocess call: check for execution of untrusted input
(S603)
tools/scripts/storage/get_free_port.py
24-24: Unused noqa directive (non-enabled: T201)
Remove unused noqa directive
(RUF100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: tests (ubuntu-22.04)
- GitHub Check: tests (ubuntu-24.04)
- GitHub Check: lint
🔇 Additional comments (27)
.github/workflows/tests.yaml (2)
20-20: LGTM!The job rename from
non-storage-unit-teststotestsappropriately reflects the expanded test scope that now includes storage-related tests.
58-62: No changes required. The test tasks (test:cpp-unit-tests,test:cpp-integration-tests,test:spider-py-unit-tests) are self-contained and properly handle the full MariaDB lifecycle:
- Storage setup: Each task calls the
start-storagesubtask, which starts the MariaDB container, waits for readiness, and initialises the database. TheSPIDER_STORAGE_URLenvironment variable is set correctly.- Cleanup on failure: The
defer:keyword guarantees cleanup runs even if tests fail, preventing resource leaks.- CI compatibility: Dynamic port allocation via
get_free_port.pyand unique container naming viauuidgenensure portability across environments including GitHub Actions.python/spider-py/tests/client/test_driver.py (1)
16-17: LGTM!The environment-driven storage URL configuration is correctly implemented with an appropriate fallback to the default test URL.
python/spider-py/tests/storage/test_mariadb.py (1)
19-20: LGTM!The fixture correctly retrieves the storage URL from the environment variable and parses it for MariaDB connection parameters.
tools/scripts/storage/wait_for_db.py (1)
21-66: LGTM!The wait logic correctly polls the MariaDB container health status with appropriate timeout handling and sleep intervals between attempts.
Note: The static analysis warning (S603) about untrusted subprocess input is a false positive in this context, as both
docker_executableandargs.nameare controlled inputs.tests/storage/StorageTestHelper.hpp (3)
17-17: Good refactoring: constexpr string_view for constant URL.Changing from
std::string consttoconstexpr std::string_viewis more efficient and appropriate for a compile-time constant.
24-32: LGTM!The
get_storage_url()function correctly reads theSPIDER_STORAGE_URLenvironment variable and falls back to the default value when not set.
36-37: LGTM!The factory creation now uses the environment-driven storage URL, aligning with the broader refactoring to support dynamic storage configuration.
taskfiles/lint.yaml (1)
139-146: LGTM!Extending linting to cover the newly introduced storage scripts is appropriate and maintains consistency with the existing linting configuration.
tests/integration/test_signal.py (2)
15-27: LGTM!The import changes correctly align with the refactored public API in
tests/integration/client.py, switching from direct constant access to the newget_storage_url()function.
87-87: LGTM!The fixture correctly uses
get_storage_url()to obtain the runtime-configured storage URL instead of the previous hardcoded constant.tests/integration/test_scheduler_worker.py (2)
13-30: LGTM!The import changes correctly reflect the refactored API, bringing in
get_storage_url()andstoragefrom the updatedintegration.clientmodule.
93-93: LGTM!The fixture correctly uses
get_storage_url()for runtime storage URL configuration, consistent with the environment-driven approach introduced in this PR.tools/scripts/storage/get_free_port.py (1)
11-18: Be aware of the inherent TOCTOU race condition.The approach of binding to port 0 to obtain a free port has a time-of-check-time-of-use (TOCTOU) race condition: between the time this script returns the port number and when the MariaDB container actually binds to it, another process could claim the port. This is an inherent limitation of this port allocation pattern and is generally acceptable for test scenarios where the window is small.
If port conflicts occur in CI/CD environments, consider implementing retry logic in the MariaDB startup script or using Docker's dynamic port allocation features instead.
tests/integration/client.py (3)
114-114: LGTM!Renaming to uppercase
G_STORAGE_URLfollows Python naming conventions for module-level constants. The hardcoded credentials in the default URL are acceptable for test scenarios.
116-121: LGTM!The new
get_storage_url()function provides a clean abstraction for environment-driven storage URL configuration, falling back to the default whenSPIDER_STORAGE_URLis not set.
132-132: LGTM!The fixture correctly uses
get_storage_url()to support runtime storage configuration via the environment variable.tools/scripts/storage/init_db.py (1)
1-14: LGTM! Clean script setup.The shebang, dependency declaration, and imports are well-structured. Using
uv run --scriptwith inline dependencies is an excellent choice for this utility script.tools/scripts/storage/start.py (3)
1-20: LGTM! Clean script structure.The imports, logging configuration, and image constant are well-organized.
61-69: Clarify the return code when container already exists.Returning
1when the container already exists (line 69) typically indicates an error, but this is more of a no-op situation. Consider whether this should return0for success or use a different approach.The current behavior might be intentional if calling scripts need to distinguish between "started fresh" and "already running". Please verify the expected behavior in the task orchestration context.
57-66: Static analysis warnings are false positives.The Ruff S603 warnings about untrusted input in
subprocesscalls are false positives. Thedocker_executableis a hardcoded string (line 59), and all other arguments are either literals or validated by argparse.Also applies to: 73-82, 104-114
tests/integration/test_client.py (1)
10-10: LGTM! Proper migration to environment-driven storage URL.The changes correctly replace the hardcoded
g_storage_urlglobal with theget_storage_url()accessor function. This enables runtime configuration via theSPIDER_STORAGE_URLenvironment variable, which aligns with the MariaDB Docker lifecycle management introduced in this PR.The
storagefixture import on line 10 provides proper lifecycle management through pytest's fixture dependency mechanism.Also applies to: 66-66, 87-87
taskfiles/test.yaml (5)
7-9: LGTM! Clean configuration variables.The MariaDB configuration is properly centralized in global variables, making it easy to adjust defaults across all test tasks.
24-26: Excellent cross-platform UUID normalization.The UUID generation and normalization logic handles the case difference between macOS (uppercase) and Linux (lowercase), ensuring consistent container naming across platforms. The
spider-mariadb-prefix provides clear identification.Also applies to: 48-50, 89-91, 145-147, 178-180
27-28: Be aware of potential port allocation race condition.Dynamic port allocation via
get_free_port.pyis a good approach, but there's a time-of-check-to-time-of-use (TOCTOU) race condition: between obtaining the free port and starting the container, another process could bind to that port.This is unlikely in typical CI environments but could occur in heavily loaded systems or when running multiple test suites concurrently.
Consider whether the test infrastructure needs additional retry logic or port conflict handling. In practice, this may be acceptable given the short time window and typical usage patterns.
Also applies to: 51-52, 92-93, 148-149, 181-182
196-220: LGTM! Well-structured storage lifecycle management.The
start-storageinternal task properly orchestrates the three-step process:
- Start the MariaDB container
- Wait for readiness
- Initialize the database schema
The sequential execution ensures each step completes before the next begins, and the
internal: trueflag appropriately restricts this task to internal use only.
37-39: No action needed. Thestop.pyscript already handles non-existent containers gracefully by checking container existence withdocker inspect(line 35) and exiting with code 0 while logging a warning message if the container doesn't exist or isn't running (lines 40-42). Thedeferblocks will safely execute even ifstart-storagefails, as the stop script is designed to handle this scenario without producing confusing error output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @docs/src/dev-docs/testing.md:
- Around line 26-27: The sentence referencing the GitHub workflow
"[tests.yaml][gh-workflow-tests]" contains a grammatical error—remove the
redundant "the" so it reads "runs all unit tests and integration tests"; update
the sentence in the docs text where the phrase "runs the all unit tests and
integration tests" appears to "runs all unit tests and integration tests" to
correct the grammar.
🧹 Nitpick comments (4)
tests/integration/test_signal.py (1)
21-21: Remove unnecessarynoqadirective.The
# noqa: F401directive is unnecessary here. Ruff understands thatstorageis a pytest fixture and won't flag it as an unused import.♻️ Proposed fix
- storage, # noqa: F401 + storage,tests/integration/test_scheduler_worker.py (1)
24-24: Remove unnecessarynoqadirective.The
# noqa: F401directive is unnecessary here. Ruff understands thatstorageis a pytest fixture and won't flag it as an unused import.♻️ Proposed fix
- storage, # noqa: F401 + storage,tests/integration/test_client.py (1)
10-10: Remove unnecessarynoqadirective.The
# noqa: F401directive is unnecessary here. Ruff understands thatstorageis a pytest fixture and won't flag it as an unused import.♻️ Proposed fix
-from integration.client import get_storage_url, SQLConnection, storage # noqa: F401 +from integration.client import get_storage_url, SQLConnection, storagetools/scripts/storage/init_db.py (1)
242-286: Consider adding explicit error handling for better diagnostics.The connection and execution logic is correct and uses proper context management. However, there's no explicit error handling—any connection failure or SQL execution error will raise an unhandled exception. While fail-fast behaviour may be intentional for a setup script, wrapping the connection block in a try/except with structured logging would improve diagnostics.
♻️ Optional: Add error handling
def main() -> int: """Main.""" parser = argparse.ArgumentParser(description="Initialize the database tables for Spider.") # ... argument parsing ... args = parser.parse_args() try: with ( mariadb.connect( host="127.0.0.1", port=args.port, user=args.username, password=args.password, database=args.database, ) as conn, conn.cursor() as cursor, ): for table_creator in _TABLE_CREATORS: cursor.execute(table_creator) conn.commit() logger.info("Database tables initialized successfully.") return 0 except mariadb.Error as e: logger.error("Failed to initialize database: %s", e) return 1
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
.github/workflows/tests.yamldocs/src/dev-docs/testing.mdpython/spider-py/README.mdtests/integration/client.pytests/integration/test_client.pytests/integration/test_scheduler_worker.pytests/integration/test_signal.pytools/scripts/storage/init_db.pytools/scripts/storage/stop.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/integration/client.py
- .github/workflows/tests.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-09T17:15:24.552Z
Learnt from: davidlion
Repo: y-scope/spider PR: 100
File: src/spider/worker/worker.cpp:205-230
Timestamp: 2025-04-09T17:15:24.552Z
Learning: Documentation should be added to new functions in the spider codebase, as already discussed with the user.
Applied to files:
python/spider-py/README.md
🧬 Code graph analysis (4)
tools/scripts/storage/init_db.py (2)
tools/scripts/storage/start.py (1)
main(22-114)tools/scripts/storage/wait_for_db.py (1)
main(21-65)
tests/integration/test_client.py (1)
tests/integration/client.py (2)
get_storage_url(118-123)storage(127-136)
tests/integration/test_scheduler_worker.py (1)
tests/integration/client.py (2)
get_storage_url(118-123)storage(127-136)
tests/integration/test_signal.py (1)
tests/integration/client.py (5)
get_storage_url(118-123)get_task_outputs(198-223)get_task_state(226-240)remove_job(243-253)storage(127-136)
🪛 Ruff (0.14.10)
tests/integration/test_client.py
10-10: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
tests/integration/test_scheduler_worker.py
24-24: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
tests/integration/test_signal.py
21-21: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
tools/scripts/storage/stop.py
34-34: subprocess call: check for execution of untrusted input
(S603)
50-50: subprocess call: check for execution of untrusted input
(S603)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: lint
- GitHub Check: tests (ubuntu-24.04)
- GitHub Check: tests (ubuntu-22.04)
🔇 Additional comments (10)
python/spider-py/README.md (1)
27-33: Verify that the simplified Testing section adequately documents the new storage-enabled test workflow.The Testing section has been significantly simplified to a single task command. However, the PR introduces a new
SPIDER_STORAGE_URLenvironment variable and MariaDB-backed storage integration. The README no longer documents:
- How developers should configure or set up storage for local testing (e.g., whether storage-creation task runs automatically or requires manual invocation)
- The role of the
SPIDER_STORAGE_URLenvironment variable in the test workflow- Whether developers need specific prerequisites (e.g., Docker, MariaDB) installed locally
- How the unified
test:spider-py-unit-teststask interacts with the storage setup introduced in this PRConsider whether the README should include additional context about storage configuration for developers setting up tests locally, or link to more detailed testing documentation if it exists elsewhere.
tests/integration/test_signal.py (2)
76-76: LGTM! Environment-driven storage configuration.The fixture correctly uses
get_storage_url()to retrieve the storage URL from the environment variable, enabling flexible test configuration.Also applies to: 87-89
103-103: LGTM! Consistent storage fixture usage.The test methods correctly use the
storagefixture with appropriatenoqaannotations for parameter redefinition.Also applies to: 180-180
tests/integration/test_scheduler_worker.py (2)
82-82: LGTM! Environment-driven storage configuration.The fixture correctly uses
get_storage_url()to retrieve the storage URL from the environment variable, enabling flexible test configuration.Also applies to: 93-93
284-284: LGTM! Consistent storage fixture usage.The test methods correctly use the
storagefixture with appropriatenoqaannotations for parameter redefinition.Also applies to: 314-314, 331-331, 351-351
tests/integration/test_client.py (2)
55-55: LGTM! Environment-driven storage configuration.The fixture correctly uses
get_storage_url()to retrieve the storage URL from the environment variable, enabling flexible test configuration.Also applies to: 66-66
87-87: LGTM! Consistent storage URL usage.The test correctly passes the environment-driven storage URL to the
client_testprogram.tools/scripts/storage/stop.py (2)
44-55: Stop logic is correct.The container stop logic properly handles errors and logs appropriate messages. Past review comments about the variable name have been addressed (
mariadb_stop_cmdis correctly used).
20-42: Use thedocker_executablevariable consistently.The
docker_executablevariable is defined on line 23 with a comment stating it's to silence Ruff S607, but line 35 uses the hardcoded string"docker"instead of the variable, making the variable and comment misleading.🔧 Proposed fix
result = subprocess.run( - [docker_executable, "inspect", "-f", "{{.State.Running}}", args.name], + [docker_executable, "inspect", "-f", "{{.State.Running}}", args.name], capture_output=True, text=True, check=False, )Note: Line 35 already shows
docker_executablein the diff above, but verify line 45 also usesdocker_executableinstead of"docker".Likely an incorrect or invalid review comment.
tools/scripts/storage/init_db.py (1)
15-231: Schema looks solid; past timestamp issues resolved.The SQL schema is comprehensive and correctly configured. Past review comments regarding incorrect
ON UPDATE CURRENT_TIMESTAMPusage oncreation_time,start_time, andlease_timehave been addressed—these columns now correctly use onlyDEFAULT CURRENT_TIMESTAMP. Theheartbeatcolumn appropriately retainsON UPDATE CURRENT_TIMESTAMPfor auto-updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
taskfiles/test.yaml (3)
7-9: Consider documenting that these are test-only credentials.While the hardcoded credentials are acceptable for local testing, consider adding a comment clarifying that these are for development/testing purposes only and should not be used in production environments.
111-113: Simplify pytest invocation.Line 112 activates the virtual environment, but line 113 uses the full path to pytest, making the activation redundant.
♻️ Simplify to use pytest directly
- - |- - . ../test-venv/bin/activate - ../test-venv/bin/pytest tests/integration + - "../test-venv/bin/pytest tests/integration"
18-204: Consider reducing code duplication across storage-dependent tasks.The MariaDB setup pattern (container name generation, port allocation, storage URL construction, start-storage invocation, and defer cleanup) is duplicated across five tasks. This creates maintenance burden—changes to the storage lifecycle require updates in multiple places.
While YAML anchors have limitations with Taskfile's variable interpolation, consider whether a wrapper task or shared configuration could reduce this duplication in a future refactor.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
taskfiles/test.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: lint
- GitHub Check: tests (ubuntu-22.04)
- GitHub Check: tests (ubuntu-24.04)
🔇 Additional comments (3)
taskfiles/test.yaml (3)
44-68: Clarify the test task strategy.The presence of three tasks (cpp-unit-tests, cpp-storage-unit-tests, cpp-non-storage-unit-tests) creates ambiguity about which task developers should use. Consider documenting the intended workflow:
- Is cpp-unit-tests the primary entry point for running all tests?
- When should developers use the split versions?
206-230: Well-designed internal task for storage lifecycle.The start-storage task properly encapsulates the three-phase MariaDB setup (start → wait → init) with sensible defaults. Making it internal ensures it's only invoked by other tasks.
18-42: Well-structured storage test lifecycle.The task properly implements the MariaDB container lifecycle with dynamic port allocation, UUID-based container naming, and cleanup via defer. The cross-platform UUID normalization is a good defensive measure. All referenced scripts and variables are properly defined and in place.
LinZhihao-723
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the scripts. Can we restructure the tool scripts as the following:
tool/scripts/
get_free_port.py
mariadb/
start.py
stop.py
wolf/
init.py
huntsman/
init.py
get_free_port.pyseems to be a general helper- Using mariadb directly instead of storage, in case we need other storage backend in the future
- Move init scripts into the dedicated namespace
This should make the structure cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In @taskfiles/test.yaml:
- Around line 25-31: The MARIADB_PORT var (and other similar vars using
tools/scripts/get_free_port.py) uses a relative path that breaks when tasks run
from non-root directories; update the sh value for MARIADB_PORT and the other
instances (the vars/commands at the other noted blocks) to invoke the script via
the repository root variable (i.e., prefix with {{.ROOT_DIR}}) so the path
resolves regardless of task dir:, ensuring you change every occurrence that
currently is "tools/scripts/get_free_port.py".
- Around line 21-25: The SPIDER_STORAGE_URL (and the same patterns at the other
occurrences) currently interpolates G_MARIADB_USERNAME and G_MARIADB_PASSWORD
directly into the JDBC URL which will break if credentials contain characters
like @ or &; update the template to percent-encode (URL-encode)
G_MARIADB_USERNAME and G_MARIADB_PASSWORD before embedding them into
SPIDER_STORAGE_URL (or, alternatively, move user and password out of the query
string and supply them as separate JDBC connection properties), and apply the
same fix to the other identical URL templates referenced (the blocks around
lines 47–51, 90–94, 149–152, 184–187) so all credential interpolations are
safely encoded or passed as properties.
In @tools/scripts/get_free_port.py:
- Around line 21-26: The inline Ruff suppression "# ruff: noqa: T201" in the
main function is unused and should be removed; edit the main() implementation
(which calls get_free_port() and prints the port) to delete that comment so Ruff
will report T201 as intended, or alternatively keep the comment but instead add
T201 to your project's Ruff config to explicitly allow it—pick one approach and
apply it consistently.
In @tools/scripts/mariadb/start.py:
- Around line 68-77: The code only treats a running container case but ignores
when docker inspect returns exit code 0 and stdout == "false" (a stopped
container with the same name), which will cause docker run --name ... to fail
with a name conflict; update the check in the block that runs subprocess.run
(using docker_executable and args.name and the result variable) to detect
result.returncode == 0 and result.stdout.rstrip("\n") == "false", then log a
clear error like "Container <name> exists but is stopped; remove or rename it
before running" and return a non-zero exit code (or raise/exit) instead of
proceeding; apply the same explicit stopped-container handling to the other
inspect/check sequence around the 91-121 region so both places consistently
detect and fail with a helpful message when a name conflict with a stopped
container exists.
In @tools/scripts/mariadb/stop.py:
- Around line 20-55: The code incorrectly hardcodes "docker" when building
mariadb_stop_cmd and emits a misleading warning; update mariadb_stop_cmd to use
the docker_executable variable instead of the literal "docker", and change the
logger.warning in main (the inspect check) to say the container "is not running
or does not exist" (or similar) since the inspect result covers both cases; keep
the existing return behavior and error handling for subprocess.run as-is.
In @tools/scripts/mariadb/wolf/init_db.py:
- Around line 225-269: The main() function currently lets mariadb.connect and
cursor.execute raise unstructured exceptions; add explicit try/except around the
connect/execute/commit block in main() to catch mariadb.Error (or Exception),
log a clear error message including exception details, and return a non-zero
exit code on failure; also add an optional --host argparse option (default
"127.0.0.1") and use args.host when calling mariadb.connect; refer to main(),
_TABLE_CREATORS, mariadb.connect, cursor.execute, conn.commit to locate the
changes.
🧹 Nitpick comments (3)
tools/scripts/mariadb/start.py (1)
68-73: Add subprocess timeouts to avoid hung CI jobs.
docker pull/docker execcan hang indefinitely; considertimeout=...onsubprocess.run(...)(and log the timeout distinctly).Also applies to: 80-85, 111-116, 124-136
taskfiles/test.yaml (2)
206-227: Consider using{{.ROOT_DIR}}for MariaDB scripts too (defensive against cwd changes).Even if Task runs internal tasks from a predictable cwd, using absolute-from-root paths here makes the lifecycle more robust.
Proposed diff
- tools/scripts/mariadb/start.py \ + {{.ROOT_DIR}}/tools/scripts/mariadb/start.py \ @@ - tools/scripts/mariadb/wolf/init_db.py \ + {{.ROOT_DIR}}/tools/scripts/mariadb/wolf/init_db.py \
18-43: Optional: reduce duplication by wrapping “with-mariadb” lifecycle once.The repeated blocks (env + vars + defer + start-mariadb) across test tasks are easy to drift; consider a single internal task that takes “test command” as a var, or a shared included Taskfile.
Also applies to: 44-68, 77-114, 144-170, 179-205
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
taskfiles/lint.yamltaskfiles/test.yamltools/scripts/get_free_port.pytools/scripts/mariadb/start.pytools/scripts/mariadb/stop.pytools/scripts/mariadb/wolf/init_db.py
🚧 Files skipped from review as they are similar to previous changes (1)
- taskfiles/lint.yaml
🧰 Additional context used
🧬 Code graph analysis (2)
tools/scripts/mariadb/stop.py (1)
tools/scripts/mariadb/start.py (1)
main(23-144)
tools/scripts/mariadb/start.py (3)
tools/scripts/get_free_port.py (1)
main(21-26)tools/scripts/mariadb/stop.py (1)
main(20-55)tools/scripts/mariadb/wolf/init_db.py (1)
main(225-268)
🪛 Ruff (0.14.10)
tools/scripts/get_free_port.py
24-24: Unused noqa directive (non-enabled: T201)
Remove unused noqa directive
(RUF100)
tools/scripts/mariadb/stop.py
34-34: subprocess call: check for execution of untrusted input
(S603)
50-50: subprocess call: check for execution of untrusted input
(S603)
tools/scripts/mariadb/start.py
68-68: subprocess call: check for execution of untrusted input
(S603)
80-80: subprocess call: check for execution of untrusted input
(S603)
111-111: subprocess call: check for execution of untrusted input
(S603)
124-124: subprocess call: check for execution of untrusted input
(S603)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: lint
- GitHub Check: tests (ubuntu-22.04)
- GitHub Check: tests (ubuntu-24.04)
🔇 Additional comments (3)
tools/scripts/get_free_port.py (1)
11-18: Be aware: “free port” selection is TOCTOU and can flake under concurrency.This can race (port is freed immediately after Line 18). If you see sporadic “address already in use” from docker, consider adding a retry strategy around container start (or move to docker-assigned random host ports and then query the mapped port).
tools/scripts/mariadb/stop.py (1)
34-39: Subprocess S603: looks fine (no shell), but confirm container name input is trusted in your usage.Since
args.nameis CLI-provided, confirm your CI/task usage doesn’t pass attacker-controlled values (even though this is a list-form call, not shell).Also applies to: 50-50
tools/scripts/mariadb/start.py (1)
68-68: Subprocess S603 warnings: likely acceptable (no shell), but confirm this is intended.If you want to silence these, add targeted
# noqa: S603on the specific calls (or document why this is safe in-repo).Also applies to: 80-80, 111-111, 124-124
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tools/scripts/mariadb/stop.py (2)
25-31: Consider clarifying the help text.The help text refers to "the name of the started MariaDB container", which is slightly awkward phrasing for a stop script. Consider simplifying to "the name of the MariaDB container".
♻️ Suggested refinement
parser.add_argument( "--name", type=str, default="mariadb-spider-dev", - help="The name of the started MariaDB container (default: %(default)s)", + help="The name of the MariaDB container to stop (default: %(default)s)", )
40-42: Improve the warning message accuracy.The warning message "Container '%s' doesn't exist" is misleading when
returncode == 0but the container is stopped (stdout != "true"). In that case, the container exists but is not running. Consider making the message more accurate to cover both scenarios.♻️ Suggested refinement
if result.returncode != 0 or result.stdout.rstrip("\n") != "true": - logger.warning("Container '%s' doesn't exist. Exit peacefully.", args.name) + logger.warning("Container '%s' is not running or doesn't exist. Nothing to stop.", args.name) return 0
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tools/scripts/mariadb/stop.py
🧰 Additional context used
🧬 Code graph analysis (1)
tools/scripts/mariadb/stop.py (2)
tools/scripts/mariadb/wolf/init_db.py (1)
main(225-268)tools/scripts/mariadb/start.py (1)
main(23-144)
🪛 Ruff (0.14.10)
tools/scripts/mariadb/stop.py
34-34: subprocess call: check for execution of untrusted input
(S603)
50-50: subprocess call: check for execution of untrusted input
(S603)
🔇 Additional comments (5)
tools/scripts/mariadb/stop.py (5)
1-5: LGTM!The shebang and script metadata follow the project conventions and are consistent with the companion
start.pyscript.
7-17: LGTM!The imports and logging configuration are appropriate for this script's functionality.
44-55: LGTM!The container stopping logic is well-implemented with proper error handling and clear logging.
58-59: LGTM!The entry point follows standard Python conventions.
34-55: Static analysis warnings are false positives.The Ruff S603 warnings about subprocess calls are false positives in this context. The
docker_executableis hardcoded (line 23), and commands are constructed as lists rather than shell strings. Theargs.nameparameter is expected user input for specifying which container to stop, which is the intended functionality of this script.
Description
This PR:
SPIDER_STORAGE_URLinstead of hardcoded urlspider-pyREADME and test doc.Note
Need to change the required workflow for PR from
non-storage-unit-teststotests.Checklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.