-
Notifications
You must be signed in to change notification settings - Fork 2
switching over to synference-download, cleaning up files #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Caution Review failedThe pull request is closed. Pre-merge checks and finishing touches✅ Passed checks (3 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/synference/library.py (1)
5355-5376: Restore the dust.emission import
getattr(dem, dust_emission_model_name, None)now receives the package module, but classes likeGreybodylive insynthesizer.emission_models.dust.emission, not on the package itself. This change makesgetattrreturnNone, raising theValueErrorbranch and breakingfrom_libraryfor any saved model that references dust emission templates. Please switch the import back to the.dust.emissionsubmodule so the expected symbols are available.-import synthesizer.emission_models.dust as dem +import synthesizer.emission_models.dust.emission as dem
🧹 Nitpick comments (2)
src/synference/utils.py (2)
26-29: Consider more robust fallback for missing synthesizer dependency.The fallback lambda returns
os.path.dirname(__file__), which would point to thesrc/synference/directory. This may not align with user expectations for a "data directory." Consider using a more appropriate default location like~/.synference/dataor documenting this fallback behavior clearly.try: from synthesizer import get_data_dir except ImportError: - get_data_dir = lambda: os.path.dirname(__file__) # noqa: E731 + # Fallback to user's home directory if synthesizer is not available + import pathlib + get_data_dir = lambda: pathlib.Path.home() / ".synference" / "data" # noqa: E731
2802-2802: Consider lazy initialization of test_data_dir.The
test_data_diris assigned at module import time, but the directory may not exist untildownload_test_data()is called. This could lead to confusion if users try to usetest_data_dirbefore downloading the test data. Consider:
- Adding a docstring comment explaining users should call
download_test_data()first- Or implementing lazy initialization with a getter function
- Or checking directory existence and providing a helpful error message
-test_data_dir = get_data_dir() / "synference" +# Test data directory path. Call download_test_data() to ensure it exists. +test_data_dir = get_data_dir() / "synference"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
docs/source/gfx/SBI_SED.pngis excluded by!**/*.png
📒 Files selected for processing (29)
.github/workflows/docs.yml(1 hunks).github/workflows/python-app.yml(1 hunks)docs/source/advanced_topics/simformer.ipynb(1 hunks)docs/source/library_gen/complex_library_generation.ipynb(17 hunks)docs/source/library_gen/synthesizer_crash_course.ipynb(4 hunks)docs/source/posterior_inference/catalogue_fitting.ipynb(3 hunks)docs/source/posterior_inference/intro.rst(1 hunks)docs/source/posterior_inference/sed_recovery.ipynb(1 hunks)docs/source/posterior_inference/using_your_model.ipynb(0 hunks)docs/source/sbi_train/basic_sbi_model.ipynb(2 hunks)docs/source/sbi_train/complex_sbi_model.ipynb(1 hunks)docs/source/sbi_train/feature_array.ipynb(1 hunks)docs/source/sbi_train/model_optimization.ipynb(1 hunks)docs/source/sbi_train/online_training.ipynb(2 hunks)docs/source/sbi_train/validation_sampling.ipynb(1 hunks)examples/sbi/notebooks/evidence_networks_model_comparison.ipynb(0 hunks)examples/sbi/notebooks/model_comparison.ipynb(0 hunks)examples/sbi/notebooks/nle.ipynb(0 hunks)examples/sbi/notebooks/test_missing_photometry.ipynb(0 hunks)examples/sbi/notebooks/train_model_ensemble.ipynb(0 hunks)examples/sbi/notebooks/validate_model.ipynb(0 hunks)paper/bagpipes_comparison.ipynb(4 hunks)paper/model_testing.ipynb(3 hunks)paper/obs.ipynb(2 hunks)pyproject.toml(2 hunks)src/synference/__init__.py(2 hunks)src/synference/library.py(1 hunks)src/synference/utils.py(2 hunks)tests/conftest.py(2 hunks)
💤 Files with no reviewable changes (7)
- examples/sbi/notebooks/validate_model.ipynb
- examples/sbi/notebooks/test_missing_photometry.ipynb
- examples/sbi/notebooks/evidence_networks_model_comparison.ipynb
- docs/source/posterior_inference/using_your_model.ipynb
- examples/sbi/notebooks/model_comparison.ipynb
- examples/sbi/notebooks/nle.ipynb
- examples/sbi/notebooks/train_model_ensemble.ipynb
🧰 Additional context used
🧬 Code graph analysis (1)
src/synference/__init__.py (1)
src/synference/utils.py (1)
download_test_data(2788-2799)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build (3.13)
- GitHub Check: build (3.11)
- GitHub Check: build (3.12)
- GitHub Check: build (3.10)
🔇 Additional comments (17)
docs/source/sbi_train/complex_sbi_model.ipynb (1)
18-22: LGTM! Clean transition to centralized test data paths.The import and path construction changes align well with the PR's objective to centralize test data directory handling.
tests/conftest.py (2)
24-24: LGTM! Import addition supports centralized test data access.
35-37: Verify affected tests pass with the updated fixture.The fixture change from
library_dir(test_dir)tolibrary_dir()also changes the path resolution fromtest_dir / "test_libraries"totest_data_dir. While the fixture signature change removes the dependency ontest_dir, the path change affects tests that depend ontest_sbi_library:
tests/test_sbi.py:test_init_sbifitter_from_library,test_sbifitter_feature_array_creationtests/test_library.py:test_init_sbifitter_from_library,test_sbifitter_feature_array_creationEnsure these tests pass with the new external data directory path.
docs/source/sbi_train/basic_sbi_model.ipynb (1)
30-51: LGTM! Consistent path centralization.The changes follow the same pattern as other notebooks in this PR, improving maintainability through centralized test data paths.
docs/source/sbi_train/online_training.ipynb (1)
22-47: LGTM! Comprehensive path updates.Both the
hdf5_pathandoverride_library_pathnow usetest_data_dir, ensuring consistent test data resolution throughout the notebook..github/workflows/python-app.yml (2)
34-34: Verify that removingWITH_OPENMP=1doesn't break compilation.The
WITH_OPENMP=1environment variable was removed from the pip install command. Ensure this doesn't break any OpenMP-dependent compilation during package installation.
37-37: LGTM! Clean integration of test data download.The Python one-liner approach to downloading test data aligns with the PR's centralization strategy and is more maintainable than CLI invocations.
docs/source/advanced_topics/simformer.ipynb (1)
34-37: LGTM! Consistent with other notebook updates.The changes follow the established pattern for centralizing test data paths across documentation notebooks.
.github/workflows/docs.yml (1)
54-54: LGTM! Consistent workflow modernization.Switching from CLI to direct Python function calls improves maintainability and aligns with the changes in the python-app.yml workflow.
pyproject.toml (2)
112-113: LGTM! New CLI entry point supports data download workflows.The
synference-download-datascript entry point provides a convenient CLI interface for downloading test data, complementing the direct Python function calls used in the workflows.
55-55: No issues found—"tarp" dependency is actively used.Verification confirms that the
tarplibrary is imported insrc/synference/custom_runner.pyandsrc/synference/sbi_runner.py, and is actively called viatarp.get_tarp_coverage()for computing accuracy coverage metrics in the evaluation pipeline. The dependency is necessary and not extraneous.docs/source/posterior_inference/intro.rst (1)
7-8: LGTM! Toctree restructured correctly.The toctree now leads with
catalogue_fittingand removes the duplicate entry, aligning with the broader documentation restructuring in this PR.docs/source/sbi_train/model_optimization.ipynb (1)
30-34: LGTM! Dynamic path resolution implemented correctly.The notebook now imports
test_data_dirand constructs the HDF5 path dynamically using an f-string, enabling consistent test data access across the documentation.docs/source/posterior_inference/catalogue_fitting.ipynb (1)
87-96: Verify model_file parameter consistency.Similar to
validation_sampling.ipynb, themodel_fileparameter at line 92 is set tof"{test_data_dir}"(directory path only). Ensure this aligns with the expected signature ofSBI_Fitter.load_saved_model().docs/source/library_gen/synthesizer_crash_course.ipynb (1)
31-31: LGTM! Documentation improved.The updated text now mentions that variants with nebular emission post-processing are available, which provides clearer guidance to users.
docs/source/posterior_inference/sed_recovery.ipynb (1)
64-75: Consistent path resolution pattern applied.The notebook follows the same pattern as others in adopting
test_data_dir. However, the same concern aboutmodel_filepointing to a directory applies here as well.paper/model_testing.ipynb (1)
311-311: LGTM! Label corrections improve clarity and accuracy.
- Line 311: Plot text updated from "SBC" to "Coverage" for better readability
- Line 325: LaTeX axis label corrected from
yrtoMyrfor accurate unit representationAlso applies to: 325-325
| "from synference import SBI_Fitter, load_unc_model_from_hdf5, test_data_dir\n", | ||
| "\n", | ||
| "library_path = \"../example_models/BPASS_DB_v4/grid_BPASS_Chab_DenseBasis_SFH_0.01_z_14_logN_2.7_Calzetti_v3_multinode.hdf5\" # noqa: E501\n", | ||
| "library_path = (\n", | ||
| " f\"{test_data_dir}/grid_BPASS_Chab_DenseBasis_SFH_0.01_z_14_logN_2.7_Calzetti_v3_multinode.hdf5\" # noqa: E501\n", | ||
| ")\n", | ||
| "\n", | ||
| "fitter = SBI_Fitter.load_saved_model(\n", | ||
| " model_file=\"../example_models/BPASS_DB_v4\", library_path=library_path, device=\"cpu\"\n", | ||
| " model_file=f\"{test_data_dir}\", library_path=library_path, device=\"cpu\"\n", | ||
| ")\n", | ||
| "\n", | ||
| "nm_path = (\n", | ||
| " \"../example_models/BPASS_DB_v4/BPASS_DenseBasis_v4_final_nsf_0_params_empirical_noise_models.h5\"\n", | ||
| ")\n", | ||
| "nm_path = f\"{test_data_dir}/BPASS_DenseBasis_v4_final_nsf_0_params_empirical_noise_models.h5\"\n", | ||
| "noise_models = load_unc_model_from_hdf5(nm_path)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify model_file parameter points to correct path.
The model_file parameter is set to f"{test_data_dir}" which points to the directory itself rather than a specific model file. Please verify this is intentional for SBI_Fitter.load_saved_model(). If the method expects a file path (e.g., a .pkl file), this should be updated to include the filename.
🏁 Script executed:
#!/bin/bash
# Description: Check how load_saved_model uses the model_file parameter
# Search for the load_saved_model method definition
ast-grep --pattern $'def load_saved_model($$$):
$$$'
# Also search for model_file parameter usage
rg -n -A 5 'def load_saved_model' --type=pyLength of output: 6650
Correct model_file parameter to include the model filename.
The model_file parameter in SBI_Fitter.load_saved_model() must point to a specific model file, not a directory. The parameter is documented as "Path to the saved model file", and the implementation calls load_model_from_pkl() with this path. Update the code to include the filename (likely a .pkl file) rather than passing f"{test_data_dir}" alone.
🤖 Prompt for AI Agents
In docs/source/sbi_train/validation_sampling.ipynb around lines 24 to 35, the
call to SBI_Fitter.load_saved_model passes model_file=f"{test_data_dir}" (a
directory) but must be the full path to the saved model file; update the
model_file argument to include the model filename (e.g.
model_file=f"{test_data_dir}/<saved_model_name>.pkl") so load_model_from_pkl
receives a file path, and verify the filename matches the actual saved model in
test_data_dir.
| " # 2. Calculate density for main plot\n", | ||
| " xy = np.vstack([x_filt, y_filt])\n", | ||
| " try:\n", | ||
| " kde = gaussian_kde(xy)\n", | ||
| " density = kde(xy)\n", | ||
| " except np.linalg.LinAlgError:\n", | ||
| " # Handle cases with too few points\n", | ||
| " density = np.ones_like(x_filt)\n", | ||
| "\n", | ||
| " # 3. Sort points by density (so dense points are plotted last/on top)\n", | ||
| " idx = density.argsort()\n", | ||
| " x_plot, y_plot, dens_plot = x_filt[idx], y_filt[idx], density[idx]\n", | ||
| "\n", | ||
| " # 4. Plot all points, colored by density\n", | ||
| " ax1.scatter(\n", | ||
| " x_plot,\n", | ||
| " y_plot,\n", | ||
| " c=dens_plot,\n", | ||
| " cmap=cmap,\n", | ||
| " s=8,\n", | ||
| " alpha=0.7,\n", | ||
| " edgecolor=\"none\",\n", | ||
| " zorder=10,\n", | ||
| " )\n", | ||
| " # Optional: Add faint error bars if essential\n", | ||
| " # ax1.errorbar(\n", | ||
| " # x_filt, y_filt, xerr=xerr_filt, yerr=yerr_filt,\n", | ||
| " # alpha=0.1, linestyle=\"none\", marker=\"none\", color=\"grey\", zorder=5\n", | ||
| " # )\n", | ||
| "\n", | ||
| " # 1:1 Line\n", | ||
| " ax1.plot(\n", | ||
| " [np.nanmin(x_filt), np.nanmax(x_filt)],\n", | ||
| " [np.nanmin(x_filt), np.nanmax(x_filt)],\n", | ||
| " label=\"1:1 Line\",\n", | ||
| " linestyle=\"--\",\n", | ||
| " color=\"black\",\n", | ||
| " linewidth=2,\n", | ||
| " zorder=15,\n", | ||
| " )\n", | ||
| "\n", | ||
| " # --- Panel 2: Residuals ---\n", | ||
| " residuals = y_filt - x_filt\n", | ||
| "\n", | ||
| " # 1. Calculate density for residuals\n", | ||
| " xy_res = np.vstack([x_filt, residuals])\n", | ||
| " try:\n", | ||
| " kde_res = gaussian_kde(xy_res)\n", | ||
| " density_res = kde_res(xy_res)\n", | ||
| " except np.linalg.LinAlgError:\n", | ||
| " density_res = np.ones_like(x_filt)\n", | ||
| "\n", | ||
| " # 2. Sort and plot residuals\n", | ||
| " idx_res = density_res.argsort()\n", | ||
| " x_res_plot, res_plot, dens_res_plot = x_filt[idx_res], residuals[idx_res], density_res[idx_res]\n", | ||
| "\n", | ||
| " ax2.scatter(\n", | ||
| " x_res_plot,\n", | ||
| " res_plot,\n", | ||
| " c=dens_res_plot,\n", | ||
| " cmap=cmap,\n", | ||
| " s=8,\n", | ||
| " alpha=0.7,\n", | ||
| " edgecolor=\"none\",\n", | ||
| " zorder=10,\n", | ||
| " )\n", | ||
| " ax2.axhline(0, color=\"black\", linestyle=\"--\", linewidth=2, zorder=15)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle small-sample KDE fallback
scipy.stats.gaussian_kde raises a ValueError when the filtered dataset has fewer than two points or when dimensionality exceeds the sample count. Right now we only catch np.linalg.LinAlgError, so any thin slice of the data (e.g., after masking out invalid logs) will crash this cell instead of falling back to the constant-density path. Please guard on x_filt.size and catch ValueError alongside LinAlgError for both the main scatter and residual KDEs so the helper degrades gracefully. (aidoczh.com)
- try:
- kde = gaussian_kde(xy)
- density = kde(xy)
- except np.linalg.LinAlgError:
- # Handle cases with too few points
- density = np.ones_like(x_filt)
+ if x_filt.size < 2:
+ density = np.ones_like(x_filt)
+ else:
+ try:
+ kde = gaussian_kde(xy)
+ density = kde(xy)
+ except (np.linalg.LinAlgError, ValueError):
+ density = np.ones_like(x_filt)
…
- try:
- kde_res = gaussian_kde(xy_res)
- density_res = kde_res(xy_res)
- except np.linalg.LinAlgError:
- density_res = np.ones_like(x_filt)
+ if x_filt.size < 2:
+ density_res = np.ones_like(x_filt)
+ else:
+ try:
+ kde_res = gaussian_kde(xy_res)
+ density_res = kde_res(xy_res)
+ except (np.linalg.LinAlgError, ValueError):
+ density_res = np.ones_like(x_filt)🤖 Prompt for AI Agents
In paper/bagpipes_comparison.ipynb around lines 2029 to 2095, the gaussian_kde
calls only catch np.linalg.LinAlgError and will still crash for very small
filtered samples (or when dimensionality exceeds sample count); update both KDE
sections to first check if x_filt.size < 2 (or generally insufficient samples)
and, in the try/except, catch both np.linalg.LinAlgError and ValueError so you
fall back to density = np.ones_like(x_filt) (and density_res =
np.ones_like(x_filt) for residuals) when KDE cannot be constructed; apply the
same guard and exception handling to the residuals block as well.
| def download_test_data(): | ||
| """Downloads test data for Synference using the synference-download CLI tool.""" | ||
| import subprocess | ||
|
|
||
| """Downloads test data for Synference.""" | ||
|
|
||
| data_dir = get_data_dir() / "synference" | ||
| data_dir.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| # Download the test data | ||
| subprocess.run(["synference-download", "--test", "-d", str(data_dir)]) | ||
| print(f"Test data downloaded to {data_dir}") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling and remove duplicate docstring.
The download_test_data() function has several issues:
- Duplicate docstring on lines 2789 and 2792 - remove one
- No error handling for
subprocess.run()- the function should check the return code and raise an exception if the download fails - Missing return value - consider returning the path for convenience
def download_test_data():
"""Downloads test data for Synference using the synference-download CLI tool."""
import subprocess
- """Downloads test data for Synference."""
-
data_dir = get_data_dir() / "synference"
data_dir.mkdir(parents=True, exist_ok=True)
# Download the test data
- subprocess.run(["synference-download", "--test", "-d", str(data_dir)])
+ result = subprocess.run(
+ ["synference-download", "--test", "-d", str(data_dir)],
+ check=True,
+ capture_output=True,
+ text=True
+ )
print(f"Test data downloaded to {data_dir}")
+ return data_dir🤖 Prompt for AI Agents
In src/synference/utils.py around lines 2788 to 2800, remove the duplicate
docstring so the function has a single docstring, add error handling around the
subprocess.run call by using subprocess.run(..., check=False) or capturing the
CompletedProcess and verifying returncode, and raise a descriptive exception
(including stderr if captured) when the download fails; also return the data_dir
Path at the end for convenience. Ensure to capture stdout/stderr (e.g.,
text=True) when running the command so the error message can be included in the
raised exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/synference/utils.py (1)
2788-2803: Improve error handling and return the data directory.The implementation has improved since the previous review but still has some gaps:
- While
check=Truewill raiseCalledProcessErroron failure, there's notry/exceptblock to catch it and provide a helpful error message including the captured stderr- The function doesn't return
data_dirfor convenience (as suggested in the previous review)Apply this diff to address the remaining concerns:
def download_test_data(): - """Downloads test data for Synference using the synference-download CLI tool.""" + """Downloads test data for Synference using the synference-download CLI tool. + + Returns: + Path: The directory where test data was downloaded. + + Raises: + RuntimeError: If the download fails. + """ import subprocess data_dir = get_data_dir() / "synference" data_dir.mkdir(parents=True, exist_ok=True) # Download the test data - subprocess.run( - ["synference-download", "--test", "-d", str(data_dir)], - check=True, - capture_output=True, - text=True, - ) + try: + subprocess.run( + ["synference-download", "--test", "-d", str(data_dir)], + check=True, + capture_output=True, + text=True, + ) + except subprocess.CalledProcessError as e: + raise RuntimeError( + f"Failed to download test data: {e}\n" + f"stdout: {e.stdout}\n" + f"stderr: {e.stderr}" + ) from e print(f"Test data downloaded to {data_dir}") + return data_dirBased on learnings
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.github/workflows/docs.yml(1 hunks).github/workflows/python-app.yml(1 hunks)paper/model_testing.ipynb(4 hunks)src/synference/utils.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/python-app.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build (3.12)
- GitHub Check: build (3.10)
- GitHub Check: build (3.13)
- GitHub Check: build (3.11)
🔇 Additional comments (7)
paper/model_testing.ipynb (4)
67-70: LGTM! Consistent parameter naming.Both
load_saved_modelcalls now consistently uselibrary_path(lines 69 and 1548), improving API clarity.
311-311: LGTM! More accurate plot label.The label change from "SBC" to "Coverage" better describes the rank-based coverage diagnostic being plotted.
325-325: Verify the units change is correct.The axis label units changed from years to megayears (Myr), which represents a factor of 10^6 difference. Please confirm that
log10_mass_weighted_ageis indeed in units of log10(Myr) rather than log10(yr) to ensure the label accurately represents the data.
1546-1549: Parameter migration complete and verified.All
grid_pathreferences have been successfully removed and replaced withlibrary_paththroughout the notebook. The secondload_saved_modelcall at lines 1546-1549 properly uses thelibrary_pathparameter, maintaining consistency with the rest of the codebase..github/workflows/docs.yml (3)
46-46: LGTM!The dense_basis Git URL update looks correct.
48-48: LGTM!Adding the synthesizer dependency is appropriate given the new import in
src/synference/utils.py.
54-55: Both download commands appear intentional but require developer confirmation of actual usage.The workflow structure indicates both datasets are deliberately configured:
synthesizer-download --test-grids --dust-griddownloads synthesizer project grids (unique to docs workflow)download_test_data()downloads synference data (shared across workflows)- The step is named "Download test data" (plural) with an explicit
mkdir -p data/libraries/setup- Both execute sequentially before the Sphinx build
However, to fully confirm both datasets are necessary for documentation generation, verify that the notebooks or documentation source code actually reference and use both the synthesizer grids and synference test data. If either dataset is unused, remove the corresponding download command.
| try: | ||
| from synthesizer import get_data_dir | ||
| except ImportError: | ||
| get_data_dir = lambda: os.path.dirname(__file__) # noqa: E731 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fallback data directory may not be appropriate.
The fallback lambda: os.path.dirname(__file__) returns the package source directory (src/synference/), which is typically not writable in installed packages and not an appropriate location for downloaded data files.
Consider using a proper user data directory as fallback:
try:
from synthesizer import get_data_dir
except ImportError:
- get_data_dir = lambda: os.path.dirname(__file__) # noqa: E731
+ from pathlib import Path
+ import tempfile
+ def get_data_dir():
+ """Fallback data directory when synthesizer is not available."""
+ # Use user's cache directory or temp directory
+ cache_dir = Path.home() / ".cache" / "synference"
+ cache_dir.mkdir(parents=True, exist_ok=True)
+ return cache_dir🤖 Prompt for AI Agents
In src/synference/utils.py around lines 26 to 29, the current fallback
get_data_dir returns the package source directory (os.path.dirname(__file__)),
which is not suitable for storing downloaded data; replace the fallback to
return a proper per-user data directory (e.g., use
platformdirs.user_data_dir('synference') or appdirs equivalent), ensure the
directory is created if missing and writable, and keep using the
synthesizer.get_data_dir when available.
| print(f"Test data downloaded to {data_dir}") | ||
|
|
||
|
|
||
| test_data_dir = get_data_dir() / "synference" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Module-level call may fail when synthesizer is not available.
The module-level assignment test_data_dir = get_data_dir() / "synference" has two issues:
- Type incompatibility: If the fallback lambda is used,
get_data_dir()returns a string (os.path.dirname(__file__)), but the/operator requires aPathobject, causing aTypeError - Eager execution: The directory resolution happens at import time even if
test_data_diris never used, which could cause issues in environments where data directories aren't accessible
Consider making this a lazy property or function:
-test_data_dir = get_data_dir() / "synference"
+def get_test_data_dir():
+ """Get the test data directory path.
+
+ Returns:
+ Path: The test data directory.
+ """
+ from pathlib import Path
+ return Path(get_data_dir()) / "synference"Then update references to use get_test_data_dir() instead of test_data_dir.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/synference/utils.py around line 2805, the module-level assignment
test_data_dir = get_data_dir() / "synference" is unsafe because get_data_dir()
can return a str (fallback) causing a TypeError with the / operator and it
executes at import time; replace this with a lazy accessor function
get_test_data_dir() that calls get_data_dir(), converts its result to a
pathlib.Path (Path(get_data_dir())) and returns Path(...) / "synference", and
update all module references to call get_test_data_dir() instead of using the
module-level variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/workflows/python-app.yml (1)
38-38: Extract test data download into a separate workflow step with error handling.Embedding the download command as an inline Python one-liner makes it difficult to troubleshoot failures and provides no visibility into download progress or errors. If the download fails, the error may be masked or unclear in the log output.
Refactor to use a dedicated workflow step for clarity, observability, and proper error handling:
- name: Install dependencies run: | python -m pip install --upgrade pip pip install . pip install ruff pytest pip install "ltu_ili@git+https://github.com/maho3/ltu-ili.git" pip install "cosmos-synthesizer@git+https://github.com/synthesizer-project/synthesizer.git#egg=synthesizer" - python -c "from synference.utils import download_test_data; download_test_data()" + + - name: Download test data + run: | + python -c "from synference.utils import download_test_data; download_test_data()"Alternatively, if a console script
synference-download-datais available in pyproject.toml (as mentioned in the enriched summary), consider using that directly:+ - name: Download test data + run: | + synference-download-dataAdditionally, verify that
download_test_data()is idempotent and handles network errors gracefully.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/docs.yml(1 hunks).github/workflows/python-app.yml(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/docs.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build (3.12)
- GitHub Check: build (3.13)
- GitHub Check: build (3.10)
- GitHub Check: build (3.11)
🔇 Additional comments (2)
.github/workflows/python-app.yml (2)
34-34: Verify that removingWITH_OPENMP=1doesn't break builds or tests.The
WITH_OPENMP=1environment variable was removed from the pip install step. Confirm this doesn't impact builds on systems where OpenMP is needed or expected, and verify that no performance-critical code paths rely on this flag.
37-37: No issues found—dependency configuration is correct.The GitHub URL is valid (the 301 redirect is standard GitHub behavior), the egg name
synthesizercorrectly maps to the actual Python package directory inside the repo, and there are no circular or conflicting dependencies. The synference project listscosmos-synthesizeras a dependency without version constraints, and no reverse dependency exists. The configuration is sound.
switching over to synference-download, cleaning up files
Summary by CodeRabbit
New Features
Documentation
Dependencies
Chores