Skip to content

Conversation

@tHarvey303
Copy link
Collaborator

@tHarvey303 tHarvey303 commented Nov 10, 2025

Summary by CodeRabbit

  • New Features

    • Added a test-data download utility and a centralized test-data path for consistent access across the project.
  • Documentation

    • Updated many notebooks/examples to use the centralized data path; removed several outdated example notebooks.
  • Dependencies

    • Introduced an additional runtime dependency and added automated test-data retrieval during install/setup.
  • Chores

    • Streamlined install/setup steps to fetch test data as part of the workflow.

@coderabbitai
Copy link

coderabbitai bot commented Nov 10, 2025

Caution

Review failed

The pull request is closed.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: switching to synference-download for test data handling and removing unused notebook files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6164c76 and 20e6549.

📒 Files selected for processing (2)
  • .github/workflows/docs.yml (1 hunks)
  • docs/source/library_gen/synthesizer_crash_course.ipynb (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synference/library.py (1)

5355-5376: Restore the dust.emission import

getattr(dem, dust_emission_model_name, None) now receives the package module, but classes like Greybody live in synthesizer.emission_models.dust.emission, not on the package itself. This change makes getattr return None, raising the ValueError branch and breaking from_library for any saved model that references dust emission templates. Please switch the import back to the .dust.emission submodule so the expected symbols are available.

-import synthesizer.emission_models.dust as dem
+import synthesizer.emission_models.dust.emission as dem
🧹 Nitpick comments (2)
src/synference/utils.py (2)

26-29: Consider more robust fallback for missing synthesizer dependency.

The fallback lambda returns os.path.dirname(__file__), which would point to the src/synference/ directory. This may not align with user expectations for a "data directory." Consider using a more appropriate default location like ~/.synference/data or documenting this fallback behavior clearly.

 try:
     from synthesizer import get_data_dir
 except ImportError:
-    get_data_dir = lambda: os.path.dirname(__file__)  # noqa: E731
+    # Fallback to user's home directory if synthesizer is not available
+    import pathlib
+    get_data_dir = lambda: pathlib.Path.home() / ".synference" / "data"  # noqa: E731

2802-2802: Consider lazy initialization of test_data_dir.

The test_data_dir is assigned at module import time, but the directory may not exist until download_test_data() is called. This could lead to confusion if users try to use test_data_dir before downloading the test data. Consider:

  1. Adding a docstring comment explaining users should call download_test_data() first
  2. Or implementing lazy initialization with a getter function
  3. Or checking directory existence and providing a helpful error message
 
-test_data_dir = get_data_dir() / "synference"
+# Test data directory path. Call download_test_data() to ensure it exists.
+test_data_dir = get_data_dir() / "synference"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d2464d and b148232.

⛔ Files ignored due to path filters (1)
  • docs/source/gfx/SBI_SED.png is excluded by !**/*.png
📒 Files selected for processing (29)
  • .github/workflows/docs.yml (1 hunks)
  • .github/workflows/python-app.yml (1 hunks)
  • docs/source/advanced_topics/simformer.ipynb (1 hunks)
  • docs/source/library_gen/complex_library_generation.ipynb (17 hunks)
  • docs/source/library_gen/synthesizer_crash_course.ipynb (4 hunks)
  • docs/source/posterior_inference/catalogue_fitting.ipynb (3 hunks)
  • docs/source/posterior_inference/intro.rst (1 hunks)
  • docs/source/posterior_inference/sed_recovery.ipynb (1 hunks)
  • docs/source/posterior_inference/using_your_model.ipynb (0 hunks)
  • docs/source/sbi_train/basic_sbi_model.ipynb (2 hunks)
  • docs/source/sbi_train/complex_sbi_model.ipynb (1 hunks)
  • docs/source/sbi_train/feature_array.ipynb (1 hunks)
  • docs/source/sbi_train/model_optimization.ipynb (1 hunks)
  • docs/source/sbi_train/online_training.ipynb (2 hunks)
  • docs/source/sbi_train/validation_sampling.ipynb (1 hunks)
  • examples/sbi/notebooks/evidence_networks_model_comparison.ipynb (0 hunks)
  • examples/sbi/notebooks/model_comparison.ipynb (0 hunks)
  • examples/sbi/notebooks/nle.ipynb (0 hunks)
  • examples/sbi/notebooks/test_missing_photometry.ipynb (0 hunks)
  • examples/sbi/notebooks/train_model_ensemble.ipynb (0 hunks)
  • examples/sbi/notebooks/validate_model.ipynb (0 hunks)
  • paper/bagpipes_comparison.ipynb (4 hunks)
  • paper/model_testing.ipynb (3 hunks)
  • paper/obs.ipynb (2 hunks)
  • pyproject.toml (2 hunks)
  • src/synference/__init__.py (2 hunks)
  • src/synference/library.py (1 hunks)
  • src/synference/utils.py (2 hunks)
  • tests/conftest.py (2 hunks)
💤 Files with no reviewable changes (7)
  • examples/sbi/notebooks/validate_model.ipynb
  • examples/sbi/notebooks/test_missing_photometry.ipynb
  • examples/sbi/notebooks/evidence_networks_model_comparison.ipynb
  • docs/source/posterior_inference/using_your_model.ipynb
  • examples/sbi/notebooks/model_comparison.ipynb
  • examples/sbi/notebooks/nle.ipynb
  • examples/sbi/notebooks/train_model_ensemble.ipynb
🧰 Additional context used
🧬 Code graph analysis (1)
src/synference/__init__.py (1)
src/synference/utils.py (1)
  • download_test_data (2788-2799)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build (3.13)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.12)
  • GitHub Check: build (3.10)
🔇 Additional comments (17)
docs/source/sbi_train/complex_sbi_model.ipynb (1)

18-22: LGTM! Clean transition to centralized test data paths.

The import and path construction changes align well with the PR's objective to centralize test data directory handling.

tests/conftest.py (2)

24-24: LGTM! Import addition supports centralized test data access.


35-37: Verify affected tests pass with the updated fixture.

The fixture change from library_dir(test_dir) to library_dir() also changes the path resolution from test_dir / "test_libraries" to test_data_dir. While the fixture signature change removes the dependency on test_dir, the path change affects tests that depend on test_sbi_library:

  • tests/test_sbi.py: test_init_sbifitter_from_library, test_sbifitter_feature_array_creation
  • tests/test_library.py: test_init_sbifitter_from_library, test_sbifitter_feature_array_creation

Ensure these tests pass with the new external data directory path.

docs/source/sbi_train/basic_sbi_model.ipynb (1)

30-51: LGTM! Consistent path centralization.

The changes follow the same pattern as other notebooks in this PR, improving maintainability through centralized test data paths.

docs/source/sbi_train/online_training.ipynb (1)

22-47: LGTM! Comprehensive path updates.

Both the hdf5_path and override_library_path now use test_data_dir, ensuring consistent test data resolution throughout the notebook.

.github/workflows/python-app.yml (2)

34-34: Verify that removing WITH_OPENMP=1 doesn't break compilation.

The WITH_OPENMP=1 environment variable was removed from the pip install command. Ensure this doesn't break any OpenMP-dependent compilation during package installation.


37-37: LGTM! Clean integration of test data download.

The Python one-liner approach to downloading test data aligns with the PR's centralization strategy and is more maintainable than CLI invocations.

docs/source/advanced_topics/simformer.ipynb (1)

34-37: LGTM! Consistent with other notebook updates.

The changes follow the established pattern for centralizing test data paths across documentation notebooks.

.github/workflows/docs.yml (1)

54-54: LGTM! Consistent workflow modernization.

Switching from CLI to direct Python function calls improves maintainability and aligns with the changes in the python-app.yml workflow.

pyproject.toml (2)

112-113: LGTM! New CLI entry point supports data download workflows.

The synference-download-data script entry point provides a convenient CLI interface for downloading test data, complementing the direct Python function calls used in the workflows.


55-55: No issues found—"tarp" dependency is actively used.

Verification confirms that the tarp library is imported in src/synference/custom_runner.py and src/synference/sbi_runner.py, and is actively called via tarp.get_tarp_coverage() for computing accuracy coverage metrics in the evaluation pipeline. The dependency is necessary and not extraneous.

docs/source/posterior_inference/intro.rst (1)

7-8: LGTM! Toctree restructured correctly.

The toctree now leads with catalogue_fitting and removes the duplicate entry, aligning with the broader documentation restructuring in this PR.

docs/source/sbi_train/model_optimization.ipynb (1)

30-34: LGTM! Dynamic path resolution implemented correctly.

The notebook now imports test_data_dir and constructs the HDF5 path dynamically using an f-string, enabling consistent test data access across the documentation.

docs/source/posterior_inference/catalogue_fitting.ipynb (1)

87-96: Verify model_file parameter consistency.

Similar to validation_sampling.ipynb, the model_file parameter at line 92 is set to f"{test_data_dir}" (directory path only). Ensure this aligns with the expected signature of SBI_Fitter.load_saved_model().

docs/source/library_gen/synthesizer_crash_course.ipynb (1)

31-31: LGTM! Documentation improved.

The updated text now mentions that variants with nebular emission post-processing are available, which provides clearer guidance to users.

docs/source/posterior_inference/sed_recovery.ipynb (1)

64-75: Consistent path resolution pattern applied.

The notebook follows the same pattern as others in adopting test_data_dir. However, the same concern about model_file pointing to a directory applies here as well.

paper/model_testing.ipynb (1)

311-311: LGTM! Label corrections improve clarity and accuracy.

  • Line 311: Plot text updated from "SBC" to "Coverage" for better readability
  • Line 325: LaTeX axis label corrected from yr to Myr for accurate unit representation

Also applies to: 325-325

Comment on lines +24 to 35
"from synference import SBI_Fitter, load_unc_model_from_hdf5, test_data_dir\n",
"\n",
"library_path = \"../example_models/BPASS_DB_v4/grid_BPASS_Chab_DenseBasis_SFH_0.01_z_14_logN_2.7_Calzetti_v3_multinode.hdf5\" # noqa: E501\n",
"library_path = (\n",
" f\"{test_data_dir}/grid_BPASS_Chab_DenseBasis_SFH_0.01_z_14_logN_2.7_Calzetti_v3_multinode.hdf5\" # noqa: E501\n",
")\n",
"\n",
"fitter = SBI_Fitter.load_saved_model(\n",
" model_file=\"../example_models/BPASS_DB_v4\", library_path=library_path, device=\"cpu\"\n",
" model_file=f\"{test_data_dir}\", library_path=library_path, device=\"cpu\"\n",
")\n",
"\n",
"nm_path = (\n",
" \"../example_models/BPASS_DB_v4/BPASS_DenseBasis_v4_final_nsf_0_params_empirical_noise_models.h5\"\n",
")\n",
"nm_path = f\"{test_data_dir}/BPASS_DenseBasis_v4_final_nsf_0_params_empirical_noise_models.h5\"\n",
"noise_models = load_unc_model_from_hdf5(nm_path)\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify model_file parameter points to correct path.

The model_file parameter is set to f"{test_data_dir}" which points to the directory itself rather than a specific model file. Please verify this is intentional for SBI_Fitter.load_saved_model(). If the method expects a file path (e.g., a .pkl file), this should be updated to include the filename.


🏁 Script executed:

#!/bin/bash
# Description: Check how load_saved_model uses the model_file parameter

# Search for the load_saved_model method definition
ast-grep --pattern $'def load_saved_model($$$):
  $$$'

# Also search for model_file parameter usage
rg -n -A 5 'def load_saved_model' --type=py

Length of output: 6650


Correct model_file parameter to include the model filename.

The model_file parameter in SBI_Fitter.load_saved_model() must point to a specific model file, not a directory. The parameter is documented as "Path to the saved model file", and the implementation calls load_model_from_pkl() with this path. Update the code to include the filename (likely a .pkl file) rather than passing f"{test_data_dir}" alone.

🤖 Prompt for AI Agents
In docs/source/sbi_train/validation_sampling.ipynb around lines 24 to 35, the
call to SBI_Fitter.load_saved_model passes model_file=f"{test_data_dir}" (a
directory) but must be the full path to the saved model file; update the
model_file argument to include the model filename (e.g.
model_file=f"{test_data_dir}/<saved_model_name>.pkl") so load_model_from_pkl
receives a file path, and verify the filename matches the actual saved model in
test_data_dir.

Comment on lines +2029 to +2095
" # 2. Calculate density for main plot\n",
" xy = np.vstack([x_filt, y_filt])\n",
" try:\n",
" kde = gaussian_kde(xy)\n",
" density = kde(xy)\n",
" except np.linalg.LinAlgError:\n",
" # Handle cases with too few points\n",
" density = np.ones_like(x_filt)\n",
"\n",
" # 3. Sort points by density (so dense points are plotted last/on top)\n",
" idx = density.argsort()\n",
" x_plot, y_plot, dens_plot = x_filt[idx], y_filt[idx], density[idx]\n",
"\n",
" # 4. Plot all points, colored by density\n",
" ax1.scatter(\n",
" x_plot,\n",
" y_plot,\n",
" c=dens_plot,\n",
" cmap=cmap,\n",
" s=8,\n",
" alpha=0.7,\n",
" edgecolor=\"none\",\n",
" zorder=10,\n",
" )\n",
" # Optional: Add faint error bars if essential\n",
" # ax1.errorbar(\n",
" # x_filt, y_filt, xerr=xerr_filt, yerr=yerr_filt,\n",
" # alpha=0.1, linestyle=\"none\", marker=\"none\", color=\"grey\", zorder=5\n",
" # )\n",
"\n",
" # 1:1 Line\n",
" ax1.plot(\n",
" [np.nanmin(x_filt), np.nanmax(x_filt)],\n",
" [np.nanmin(x_filt), np.nanmax(x_filt)],\n",
" label=\"1:1 Line\",\n",
" linestyle=\"--\",\n",
" color=\"black\",\n",
" linewidth=2,\n",
" zorder=15,\n",
" )\n",
"\n",
" # --- Panel 2: Residuals ---\n",
" residuals = y_filt - x_filt\n",
"\n",
" # 1. Calculate density for residuals\n",
" xy_res = np.vstack([x_filt, residuals])\n",
" try:\n",
" kde_res = gaussian_kde(xy_res)\n",
" density_res = kde_res(xy_res)\n",
" except np.linalg.LinAlgError:\n",
" density_res = np.ones_like(x_filt)\n",
"\n",
" # 2. Sort and plot residuals\n",
" idx_res = density_res.argsort()\n",
" x_res_plot, res_plot, dens_res_plot = x_filt[idx_res], residuals[idx_res], density_res[idx_res]\n",
"\n",
" ax2.scatter(\n",
" x_res_plot,\n",
" res_plot,\n",
" c=dens_res_plot,\n",
" cmap=cmap,\n",
" s=8,\n",
" alpha=0.7,\n",
" edgecolor=\"none\",\n",
" zorder=10,\n",
" )\n",
" ax2.axhline(0, color=\"black\", linestyle=\"--\", linewidth=2, zorder=15)\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle small-sample KDE fallback

scipy.stats.gaussian_kde raises a ValueError when the filtered dataset has fewer than two points or when dimensionality exceeds the sample count. Right now we only catch np.linalg.LinAlgError, so any thin slice of the data (e.g., after masking out invalid logs) will crash this cell instead of falling back to the constant-density path. Please guard on x_filt.size and catch ValueError alongside LinAlgError for both the main scatter and residual KDEs so the helper degrades gracefully. (aidoczh.com)

-    try:
-        kde = gaussian_kde(xy)
-        density = kde(xy)
-    except np.linalg.LinAlgError:
-        # Handle cases with too few points
-        density = np.ones_like(x_filt)
+    if x_filt.size < 2:
+        density = np.ones_like(x_filt)
+    else:
+        try:
+            kde = gaussian_kde(xy)
+            density = kde(xy)
+        except (np.linalg.LinAlgError, ValueError):
+            density = np.ones_like(x_filt)-    try:
-        kde_res = gaussian_kde(xy_res)
-        density_res = kde_res(xy_res)
-    except np.linalg.LinAlgError:
-        density_res = np.ones_like(x_filt)
+    if x_filt.size < 2:
+        density_res = np.ones_like(x_filt)
+    else:
+        try:
+            kde_res = gaussian_kde(xy_res)
+            density_res = kde_res(xy_res)
+        except (np.linalg.LinAlgError, ValueError):
+            density_res = np.ones_like(x_filt)
🤖 Prompt for AI Agents
In paper/bagpipes_comparison.ipynb around lines 2029 to 2095, the gaussian_kde
calls only catch np.linalg.LinAlgError and will still crash for very small
filtered samples (or when dimensionality exceeds sample count); update both KDE
sections to first check if x_filt.size < 2 (or generally insufficient samples)
and, in the try/except, catch both np.linalg.LinAlgError and ValueError so you
fall back to density = np.ones_like(x_filt) (and density_res =
np.ones_like(x_filt) for residuals) when KDE cannot be constructed; apply the
same guard and exception handling to the residuals block as well.

Comment on lines 2788 to 2800
def download_test_data():
"""Downloads test data for Synference using the synference-download CLI tool."""
import subprocess

"""Downloads test data for Synference."""

data_dir = get_data_dir() / "synference"
data_dir.mkdir(parents=True, exist_ok=True)

# Download the test data
subprocess.run(["synference-download", "--test", "-d", str(data_dir)])
print(f"Test data downloaded to {data_dir}")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add error handling and remove duplicate docstring.

The download_test_data() function has several issues:

  1. Duplicate docstring on lines 2789 and 2792 - remove one
  2. No error handling for subprocess.run() - the function should check the return code and raise an exception if the download fails
  3. Missing return value - consider returning the path for convenience
 def download_test_data():
     """Downloads test data for Synference using the synference-download CLI tool."""
     import subprocess
 
-    """Downloads test data for Synference."""
-
     data_dir = get_data_dir() / "synference"
     data_dir.mkdir(parents=True, exist_ok=True)
 
     # Download the test data
-    subprocess.run(["synference-download", "--test", "-d", str(data_dir)])
+    result = subprocess.run(
+        ["synference-download", "--test", "-d", str(data_dir)],
+        check=True,
+        capture_output=True,
+        text=True
+    )
     print(f"Test data downloaded to {data_dir}")
+    return data_dir
🤖 Prompt for AI Agents
In src/synference/utils.py around lines 2788 to 2800, remove the duplicate
docstring so the function has a single docstring, add error handling around the
subprocess.run call by using subprocess.run(..., check=False) or capturing the
CompletedProcess and verifying returncode, and raise a descriptive exception
(including stderr if captured) when the download fails; also return the data_dir
Path at the end for convenience. Ensure to capture stdout/stderr (e.g.,
text=True) when running the command so the error message can be included in the
raised exception.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/synference/utils.py (1)

2788-2803: Improve error handling and return the data directory.

The implementation has improved since the previous review but still has some gaps:

  1. While check=True will raise CalledProcessError on failure, there's no try/except block to catch it and provide a helpful error message including the captured stderr
  2. The function doesn't return data_dir for convenience (as suggested in the previous review)

Apply this diff to address the remaining concerns:

 def download_test_data():
-    """Downloads test data for Synference using the synference-download CLI tool."""
+    """Downloads test data for Synference using the synference-download CLI tool.
+    
+    Returns:
+        Path: The directory where test data was downloaded.
+        
+    Raises:
+        RuntimeError: If the download fails.
+    """
     import subprocess
 
     data_dir = get_data_dir() / "synference"
     data_dir.mkdir(parents=True, exist_ok=True)
 
     # Download the test data
-    subprocess.run(
-        ["synference-download", "--test", "-d", str(data_dir)],
-        check=True,
-        capture_output=True,
-        text=True,
-    )
+    try:
+        subprocess.run(
+            ["synference-download", "--test", "-d", str(data_dir)],
+            check=True,
+            capture_output=True,
+            text=True,
+        )
+    except subprocess.CalledProcessError as e:
+        raise RuntimeError(
+            f"Failed to download test data: {e}\n"
+            f"stdout: {e.stdout}\n"
+            f"stderr: {e.stderr}"
+        ) from e
     print(f"Test data downloaded to {data_dir}")
+    return data_dir

Based on learnings

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b148232 and 13a04df.

📒 Files selected for processing (4)
  • .github/workflows/docs.yml (1 hunks)
  • .github/workflows/python-app.yml (1 hunks)
  • paper/model_testing.ipynb (4 hunks)
  • src/synference/utils.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/python-app.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build (3.12)
  • GitHub Check: build (3.10)
  • GitHub Check: build (3.13)
  • GitHub Check: build (3.11)
🔇 Additional comments (7)
paper/model_testing.ipynb (4)

67-70: LGTM! Consistent parameter naming.

Both load_saved_model calls now consistently use library_path (lines 69 and 1548), improving API clarity.


311-311: LGTM! More accurate plot label.

The label change from "SBC" to "Coverage" better describes the rank-based coverage diagnostic being plotted.


325-325: Verify the units change is correct.

The axis label units changed from years to megayears (Myr), which represents a factor of 10^6 difference. Please confirm that log10_mass_weighted_age is indeed in units of log10(Myr) rather than log10(yr) to ensure the label accurately represents the data.


1546-1549: Parameter migration complete and verified.

All grid_path references have been successfully removed and replaced with library_path throughout the notebook. The second load_saved_model call at lines 1546-1549 properly uses the library_path parameter, maintaining consistency with the rest of the codebase.

.github/workflows/docs.yml (3)

46-46: LGTM!

The dense_basis Git URL update looks correct.


48-48: LGTM!

Adding the synthesizer dependency is appropriate given the new import in src/synference/utils.py.


54-55: Both download commands appear intentional but require developer confirmation of actual usage.

The workflow structure indicates both datasets are deliberately configured:

  • synthesizer-download --test-grids --dust-grid downloads synthesizer project grids (unique to docs workflow)
  • download_test_data() downloads synference data (shared across workflows)
  • The step is named "Download test data" (plural) with an explicit mkdir -p data/libraries/ setup
  • Both execute sequentially before the Sphinx build

However, to fully confirm both datasets are necessary for documentation generation, verify that the notebooks or documentation source code actually reference and use both the synthesizer grids and synference test data. If either dataset is unused, remove the corresponding download command.

Comment on lines +26 to +29
try:
from synthesizer import get_data_dir
except ImportError:
get_data_dir = lambda: os.path.dirname(__file__) # noqa: E731
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fallback data directory may not be appropriate.

The fallback lambda: os.path.dirname(__file__) returns the package source directory (src/synference/), which is typically not writable in installed packages and not an appropriate location for downloaded data files.

Consider using a proper user data directory as fallback:

 try:
     from synthesizer import get_data_dir
 except ImportError:
-    get_data_dir = lambda: os.path.dirname(__file__)  # noqa: E731
+    from pathlib import Path
+    import tempfile
+    def get_data_dir():
+        """Fallback data directory when synthesizer is not available."""
+        # Use user's cache directory or temp directory
+        cache_dir = Path.home() / ".cache" / "synference"
+        cache_dir.mkdir(parents=True, exist_ok=True)
+        return cache_dir
🤖 Prompt for AI Agents
In src/synference/utils.py around lines 26 to 29, the current fallback
get_data_dir returns the package source directory (os.path.dirname(__file__)),
which is not suitable for storing downloaded data; replace the fallback to
return a proper per-user data directory (e.g., use
platformdirs.user_data_dir('synference') or appdirs equivalent), ensure the
directory is created if missing and writable, and keep using the
synthesizer.get_data_dir when available.

print(f"Test data downloaded to {data_dir}")


test_data_dir = get_data_dir() / "synference"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Module-level call may fail when synthesizer is not available.

The module-level assignment test_data_dir = get_data_dir() / "synference" has two issues:

  1. Type incompatibility: If the fallback lambda is used, get_data_dir() returns a string (os.path.dirname(__file__)), but the / operator requires a Path object, causing a TypeError
  2. Eager execution: The directory resolution happens at import time even if test_data_dir is never used, which could cause issues in environments where data directories aren't accessible

Consider making this a lazy property or function:

-test_data_dir = get_data_dir() / "synference"
+def get_test_data_dir():
+    """Get the test data directory path.
+    
+    Returns:
+        Path: The test data directory.
+    """
+    from pathlib import Path
+    return Path(get_data_dir()) / "synference"

Then update references to use get_test_data_dir() instead of test_data_dir.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/synference/utils.py around line 2805, the module-level assignment
test_data_dir = get_data_dir() / "synference" is unsafe because get_data_dir()
can return a str (fallback) causing a TypeError with the / operator and it
executes at import time; replace this with a lazy accessor function
get_test_data_dir() that calls get_data_dir(), converts its result to a
pathlib.Path (Path(get_data_dir())) and returns Path(...) / "synference", and
update all module references to call get_test_data_dir() instead of using the
module-level variable.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
.github/workflows/python-app.yml (1)

38-38: Extract test data download into a separate workflow step with error handling.

Embedding the download command as an inline Python one-liner makes it difficult to troubleshoot failures and provides no visibility into download progress or errors. If the download fails, the error may be masked or unclear in the log output.

Refactor to use a dedicated workflow step for clarity, observability, and proper error handling:

    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install .
        pip install ruff pytest
        pip install "ltu_ili@git+https://github.com/maho3/ltu-ili.git"
        pip install "cosmos-synthesizer@git+https://github.com/synthesizer-project/synthesizer.git#egg=synthesizer"
-        python -c "from synference.utils import download_test_data; download_test_data()"
+
+    - name: Download test data
+      run: |
+        python -c "from synference.utils import download_test_data; download_test_data()"

Alternatively, if a console script synference-download-data is available in pyproject.toml (as mentioned in the enriched summary), consider using that directly:

+    - name: Download test data
+      run: |
+        synference-download-data

Additionally, verify that download_test_data() is idempotent and handles network errors gracefully.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13a04df and 6164c76.

📒 Files selected for processing (2)
  • .github/workflows/docs.yml (1 hunks)
  • .github/workflows/python-app.yml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/docs.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build (3.12)
  • GitHub Check: build (3.13)
  • GitHub Check: build (3.10)
  • GitHub Check: build (3.11)
🔇 Additional comments (2)
.github/workflows/python-app.yml (2)

34-34: Verify that removing WITH_OPENMP=1 doesn't break builds or tests.

The WITH_OPENMP=1 environment variable was removed from the pip install step. Confirm this doesn't impact builds on systems where OpenMP is needed or expected, and verify that no performance-critical code paths rely on this flag.


37-37: No issues found—dependency configuration is correct.

The GitHub URL is valid (the 301 redirect is standard GitHub behavior), the egg name synthesizer correctly maps to the actual Python package directory inside the repo, and there are no circular or conflicting dependencies. The synference project lists cosmos-synthesizer as a dependency without version constraints, and no reverse dependency exists. The configuration is sound.

@tHarvey303 tHarvey303 linked an issue Nov 10, 2025 that may be closed by this pull request
@tHarvey303 tHarvey303 merged commit 799d6d4 into main Nov 10, 2025
4 of 5 checks passed
@tHarvey303 tHarvey303 deleted the qdev branch November 10, 2025 23:08
tHarvey303 added a commit that referenced this pull request Nov 13, 2025
switching over to synference-download, cleaning up files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test data download not available

1 participant