feat: add Gen3 SDK vs CDIS download performance testing suite#317
feat: add Gen3 SDK vs CDIS download performance testing suite#317Dhiren-Mhatre wants to merge 5 commits intouc-cdis:masterfrom
Conversation
Signed-off-by: Dhiren-Mhatre <kp064669@gmail.com>
|
|
||
| gen3sdk_path = os.path.expanduser( | ||
| "~/path/to/gen3sdk-python" | ||
| ) # TODO: Update this path |
There was a problem hiding this comment.
we can't have anything in here that requires updating the code and you absolutely do not want to be doing path modifications in here. Every dev is going to have diff python versions, manage venv differently, you can't hard-code this kind of path manipulation.
Instead, we can allow installing gen3 from a branch in this venv and remove all this code from line 44-60.
Instead update the docs to specify that you can test a branch of gen3 by updating the pyproject.toml to something like:
gen3 = {git = "https://github.com/Dhiren-Mhatre/gen3sdk-python", branch = "feat/multiple-download-performance-testing"}
and then poetry lock / poetry install
There was a problem hiding this comment.
try it this way and adjust the code to work like that. We want to avoid any paths or manipulation in this code. You should just inform the tester to update their pyproject.toml to install the gen3 SDK version they want to test
| args = parser.parse_args() | ||
|
|
||
| # Set up logging | ||
| logging.basicConfig( |
There was a problem hiding this comment.
this command fails if the directory doesn't exist. You need this line before:
os.makedirs(f"{args.results_dir}", exist_ok=True)
There was a problem hiding this comment.
these files either need to be adjusted to conform to the pytest framework setup here or we need a new folder to put this and the README into. Something like gen3-code-vigil/gen3-load-tests/sdk_testing
| ) | ||
|
|
||
| gen3_client_path: str = "gen3-client" | ||
| gen3sdk_path: str = "~/path/to/gen3sdk-python" # TODO: Update this path |
There was a problem hiding this comment.
we should not rely on this path, refactor to remove this requirement. rely on gen3 being available and imported appropriately
Signed-off-by: Dhiren-Mhatre <kp064669@gmail.com>
|
|
||
| ## Features | ||
|
|
||
| - **Performance Testing**: Comprehensive async and CDIS performance comparison |
There was a problem hiding this comment.
Let's say "Comprehensive Gen3 Python SDK/CLI and Golang Client performance comparison" instead
| ### Basic Usage | ||
|
|
||
| ```bash | ||
| python download_performance_test.py --manifest test_data/sample_manifest.json |
There was a problem hiding this comment.
let's replace this with what we typically expect to run. Something like this:
poetry run python download_performance_test.py \
--manifest ../test_data/sample_manifest.json \
--num-runs 3 \
--max-concurrent-async 32 \
--num-workers-cdis 4 \
--test-methods "async,cdis" \
--endpoint "https://data.midrc.org" \
--credentials "~/.gen3/MIDRC_prod.json" \
--gen3-client-path "gen3-client" \
--download-dir "downloads" \
--results-dir "download_performance_results"
| # Strategy 7: Handle Gen3 SDK directory structure (dg.MD1R/) | ||
| if "dg.MD1R" in file_path and guid: | ||
| if guid.lower() in file_path.lower(): | ||
| score += 30 | ||
|
|
||
| # Strategy 8: Check for common MIDRC file patterns | ||
| if expected_filename and any( | ||
| ext in expected_filename.lower() for ext in [".nii.gz", ".nii", ".dcm"] | ||
| ): | ||
| if any( | ||
| ext in file_basename.lower() for ext in [".nii.gz", ".nii", ".dcm"] | ||
| ): | ||
| score += 15 | ||
|
|
||
| if score > best_score: | ||
| best_score = score | ||
| best_match = file_path |
There was a problem hiding this comment.
We don't want any MIDRC specific stuff in here, let's remove this if possible.
| f"Looking for file with GUID: {guid}, expected filename: {expected_filename}" | ||
| ) | ||
|
|
||
| # Look for the best match using improved GUID-based scoring |
There was a problem hiding this comment.
why is this section necessary? Does the client not consistently provide a naming convention? Or is this to support if you renamed files in diff ways? I'd rather make an assumption about how we run the gen3-client and SDK to ensure a naming convention in this test and not have to have all this code for matching. e.g. if we can force the output to a known format, just assume that format here
|
|
||
| timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S") | ||
|
|
||
| html_content = f""" |
There was a problem hiding this comment.
let's move all this out to a separate file and load it in here. You can use a templating language to dump in data (jinga is a well known Python one)
There was a problem hiding this comment.
We don't want a bunch of HTML mingled in with Python code, we want the HTML is separate files and template in the data we need here. I'm okay with some minimal HTML, but this is too large to allow staying in this file.
| } | ||
| ) | ||
|
|
||
| if "async" in config.test_methods and GEN3_SDK_AVAILABLE: |
There was a problem hiding this comment.
let's move this block to happen before "cdis" (so if you're testing a branch it's easier to see if there's a mistake more quickly). Fail fast
| logger.warning( | ||
| f"⚠️ {tool_name} Run {run_number} had issues: " | ||
| f"return_code={result.returncode}, " | ||
| f"stderr='{result.stderr[:500]}...'" |
There was a problem hiding this comment.
don't cut off log information. Just always dump all of it:
if result.returncode != 0 or result.stderr:
logger.warning(
f"⚠️ {tool_name} Run {run_number} had issues: "
f"return_code={result.returncode}, "
f"stderr='{result.stderr}...'"
)
| logger.warning( | ||
| f"⚠️ {tool_name} Run {run_number} stdout indicates failures: " | ||
| f"'{result.stdout[:500]}...'" | ||
| if len(result.stdout) > 500 |
| cmd = [ | ||
| "python", | ||
| "-m", | ||
| "gen3.cli", |
There was a problem hiding this comment.
you need to change this, do:
poetry run gen3 and replace python -m gen3.cli b/c otherwise you are potentially getting the Global gen3 installation instead of the one in this virtualenv
There was a problem hiding this comment.
and once you fix the issues in the SDK branch I commented on, you'll need to update this command:
cmd = [
"poetry",
"run",
"gen3",
"--auth",
os.path.abspath(self.config.credentials_path),
"--endpoint",
self.config.endpoint,
"download-multiple", # NOTE: name change here
"--manifest",
os.path.abspath(manifest_path),
"--download-path",
os.path.abspath(download_dir),
"--max-concurrent-requests",
str(self.config.max_concurrent_requests_async),
"--filename-format",
"guid",
"--no-prompt",
]
New Features
Breaking Changes
Bug Fixes
Dependency updates
Deployment changes