Fix installer with uv and swift f0 pitcher by rakuri255 · Pull Request #299 · rakuri255/UltraSinger

rakuri255 · 2026-02-14T23:39:14Z

Summary by CodeRabbit

New Features
- Pitch detection now uses SwiftF0 for improved accuracy and simpler workflow.
- Output files preserve original audio/video extensions and codec-aware extraction.
Bug Fixes
- Safer scoring logic to avoid divide-by-zero; more consistent tag parsing/writing.
Chores
- Upgraded to Python 3.12 and unified cross-platform dependency/install flow around a single installer tool.
- Container updated for CUDA-enabled PyTorch (cu128); installer scripts refreshed per platform.
- Added Windows helpers to ensure CUDA/cuDNN runtime DLLs are available.
Documentation
- Installation and usage docs revised for new Python, GPU guidance, and script names.
Tests
- Adjusted tests to align with updated pitch API and tag string handling.

coderabbitai · 2026-02-14T23:39:34Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Dependency management moved to uv/pyproject.toml and Python bumped to 3.12; CREPE pitch detection replaced with SwiftF0; TensorFlow GPU checks removed in favor of PyTorch/CUDA (cu128); media audio/video extensions are now propagated through processing and output generation.

Changes

Cohort / File(s)	Summary
CI/CD Workflows `.github/workflows/main.yml`, `.github/workflows/pytest.yml`, `.github/workflows/build-container-image.yml`, `.github/workflows/pylint.yml`	Bumped Python to 3.12, upgraded actions versions, replaced per-OS pip installs with uv installer + `uv sync`, unified test execution via `uv run pytest`.
Build System & Packaging `pyproject.toml`	Switched build backend to hatchling, added explicit version/description, raised `requires-python` to >=3.12, moved dependencies into `pyproject.toml` with optional/dependency groups.
Removed legacy pin files `requirements.in`, `requirements-linux.txt`, `requirements-macos.txt`, `requirements-windows.txt`	Deleted pip-compile input and generated pinned requirement manifests; dependency management consolidated into `pyproject.toml`/uv.
Container / Dockerfile `Dockerfile`	Base CUDA image bumped; installs system Python 3.12, installs uv, uses pyproject/uv to install deps, and installs CUDA-enabled PyTorch (cu128); source copy reordered for caching.
Installer scripts (CPU & CUDA) `install/CPU/...`, `install/CUDA/...` (`linux_cpu.sh`, `macos_cpu.sh`, `windows_cpu.bat`, `linux_cuda_gpu.sh`, `windows_cuda_gpu.bat`)	Replaced venv+pip flows with uv-driven sync, added uv install/verification and PATH updates, Windows scripts detect/use Python 3.12, PyTorch installs set to cu128 where applicable.
CUDA helper scripts `install/fix_cuda11_libs.bat`, `install/fix_cudnn8.bat`	Added Windows scripts to fetch/install cuBLAS/cuDNN DLLs into a venv's torch lib with validation and cleanup.
Runtime launch scripts `run_on_linux.sh`, `run_on_mac.command`, `run_on_windows.bat`	Made startup scripts robust: resolve script dir, check/activate `.venv`, and run `UltraSinger.py` with clearer error messaging.
Pitch detection (feature) `src/modules/Pitcher/pitcher.py`, `pytest/modules/Pitcher/test_pitcher.py`	Replaced CREPE with SwiftF0, added lazy detector and `get_pitch_with_swift_f0`, consolidated file-based API `get_pitch_with_file`, removed CREPE APIs and updated tests to use simplified call.
Core app & process data `src/UltraSinger.py`, `src/Settings.py`, `src/modules/ProcessData.py`	Bumped APP_VERSION, removed CREPE/TensorFlow-specific settings, updated pitch call sites, added `audio_extension`/`video_extension` to MediaInfo, and fixed mutable default via `default_factory`.
Device detection `src/modules/DeviceDetection/device_detection.py`	Removed TensorFlow GPU checks; `check_gpu_support()` now returns `'cuda'` or `'cpu'` based solely on PyTorch.
Speech recognition `src/modules/Speech_Recognition/Whisper.py`	Added scoped monkey-patch of `torch.load` to force `weights_only=False` during model loads and restored original loader in error/finally paths; removed previous detect_language monkey-patch.
Ultrastar parsing/writing/score & tests `src/modules/Ultrastar/...`, `pytest/modules/UltraStar/test_ultrastar_writer.py`	Switched enum usages to `.value` for tags and note types across parser, writer, score calc and tests; added guards to avoid zero-division in scoring.
Audio/FFmpeg helper & conversion `src/modules/ffmpeg_helper.py`, `src/modules/Audio/convert_audio.py`, `src/modules/Audio/youtube.py`	Added `get_audio_codec_and_extension()` using ffprobe; `separate_audio_video()` returns audio/video extensions; callers propagate extensions; replaced pydub WAV→MP3 with ffmpeg-based `convert_audio_format`.
Misc small changes `src/modules/Audio/key_detector.py`, `src/modules/Midi/midi_creator.py`, `src/modules/Ultrastar/coverter/ultrastar_txt_converter.py`	Removed unused imports, moved an import to module scope, and updated Ultrastar TXT generation to use dynamic media extensions from MediaInfo.
Docs / Notebook / Release `README.md`, `ReleaseNotes.md`, `colab/UltraSinger.ipynb`	Updated docs and notebook for Python 3.12, uv-based installs, SwiftF0 default, CUDA guidance, and release notes added entries for uv, SwiftF0, and Python upgrade.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant US as UltraSinger
    participant FS as Filesystem
    participant PK as Pitcher.get_pitch_with_file
    participant SF as SwiftF0 Detector

    US->>FS: ensure processing_audio_path exists
    FS-->>US: file path
    US->>PK: get_pitch_with_file(processing_audio_path)
    PK->>FS: read audio file bytes
    FS-->>PK: audio bytes
    PK->>SF: detect(audio, sample_rate)
    SF-->>PK: pitch frames + confidences
    PK-->>US: return PitchedData

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

A rabbit hops through code with glee,
From pip to uv and Python 3.12 free.
CREPE was traded for SwiftF0's song,
CUDA boots up and the scripts run strong. 🐇✨

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 76.47% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly matches the main changes: migrating the installer from pip to uv and replacing CREPE pitch detection with SwiftF0.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `main`

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-installer-with-uv-and-swiftF0-pitcher

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 10

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

ReleaseNotes.md (1)

1-1: ⚠️ Potential issue | 🟡 Minor

Version mismatch: ReleaseNotes.md declares 0.0.13dev13 but pyproject.toml declares 0.0.13-dev14.

Update the version header in ReleaseNotes.md to match pyproject.toml, or align pyproject.toml to the release notes version.

🤖 Fix all issues with AI agents

In @.github/workflows/main.yml:
- Around line 30-33: The workflow uses deprecated action versions; update the
GitHub Actions steps so the checkout and Python setup actions use newer major
releases by changing uses: actions/checkout@v3 to actions/checkout@v4 and uses:
actions/setup-python@v4 to actions/setup-python@v5 (ensure any inputs/outputs
remain compatible with the new action versions and run a workflow test).

In `@install/fix_cuda11_libs.bat`:
- Around line 1-100: The batch script install/fix_cuda11_libs.bat currently has
LF-only endings which can break Windows cmd parsing; convert this file to CRLF
line endings (save/commit with CRLF) and add a repository .gitattributes entry
to enforce it (e.g. add a line "*.bat text eol=crlf" to .gitattributes) so
future commits keep CRLF; ensure the committed install/fix_cuda11_libs.bat is
verified in git as CRLF before merging.

In `@install/fix_cudnn8.bat`:
- Around line 41-84: The fallback to search for cudnn64_8.dll in the pip cache
(the PowerShell block that sets $found and prints paths) never wires the found
paths into the later copy logic, so when the download fails the script still
expects cudnn.zip and the extracted archive; fix by capturing the actual full
path(s) returned by that PowerShell search and then copy those DLLs into
"%VIRTUAL_ENV%\Lib\site-packages\torch\lib\" (the same target used by the
existing copy command) instead of relying on cudnn.zip; implement this by
changing the PowerShell call to output the found file paths to an environment
variable or to a temp file and add a conditional branch (or jump label) that,
when the search succeeds (%errorlevel% == 0), performs copy /Y
"<found_path_or_paths>" "%VIRTUAL_ENV%\Lib\site-packages\torch\lib\" and lists
them, otherwise keep the existing manual-download error path.

In `@pyproject.toml`:
- Around line 32-35: The listed development tools ("isort", "black", "pylint",
"pytest") are incorrectly placed in main dependencies; remove those entries from
the project dependencies block and add them to the dev group (either under
[project.optional-dependencies] with a "dev" key or into
[project.dependency-groups] -> dev) so they are installed only for development.
Ensure the names "isort", "black", "pylint", and "pytest" appear under the dev
group and update any existing dev group entries to include them rather than
keeping them in the top-level dependencies.

In `@src/modules/ffmpeg_helper.py`:
- Line 130: The default extension used when looking up codec_to_extension
contains a leading dot, causing filenames like "file..wav"; change the lookup so
the default has no leading dot (e.g., extension =
codec_to_extension.get(codec_name, "wav")) so that the caller building
f"{basename_without_ext}.{audio_ext}" produces a single dot; ensure
codec_to_extension values remain dot-less and that the extension variable (and
any audio_ext usage) expects a bare extension string.

In `@src/modules/Pitcher/pitcher.py`:
- Around line 37-39: The current normalization assumes 16-bit PCM; update the
normalization logic around the variable "audio" in
src/modules/Pitcher/pitcher.py so it handles common bit depths and unsigned
formats: detect dtype (np.uint8, signed integers like np.int16/np.int32, and
floats) and convert to float32 with correct scaling—for uint8 subtract 128 and
divide by 128, for signed integers use np.iinfo(dtype) to compute the max
absolute value and divide by that (e.g., int16 -> 32768, int32 -> 2**31 or use
max(abs(min), max)), and for float32/float64 leave as-is (or cast to float32);
implement this in place of the single hardcoded audio.astype(...)/(2**15) line
so all common WAV bit depths normalize correctly.
- Around line 16-19: The SwiftF0 instantiation currently uses speech-only bounds
(SwiftF0(fmin=65, fmax=400)) which will misclassify high singing pitches; update
the _swift_f0_detector initialization to use a music-appropriate pitch range
(either the documented full range SwiftF0(fmin=46.875, fmax=2093.75) or a
narrower singing-focused range like fmin=46.875, fmax=1200), remove the FIXME
comment, and ensure the change is made on the SwiftF0(...) call that assigns
_swift_f0_detector so singing voices are correctly detected.

In `@src/modules/Speech_Recognition/Whisper.py`:
- Around line 120-122: The restore of torch.load is happening immediately after
whisperx.load_model(), but whisperx.load_align_model() (and any alignment
models) run later and may require the patched behavior (weights_only=False) to
load pickled models; move the assignment torch.load = _original_torch_load so it
occurs only after both model loads complete (i.e., after
whisperx.load_align_model()), or expand the monkey-patched scope to wrap both
whisperx.load_model() and whisperx.load_align_model() calls together, ensuring
the temporary torch.load override remains active until both loads finish.
- Around line 12-22: The monkey-patch for torch.load (_original_torch_load,
_patched_torch_load, torch.load) is being restored too early—after
whisperx.load_model but before whisperx.load_align_model—so keep the patch
active through the entire model loading sequence by moving the restoration
(torch.load = _original_torch_load) to after whisperx.load_align_model completes
(or re-apply the patch immediately before calling whisperx.load_align_model);
ensure both whisperx.load_model and whisperx.load_align_model execute while
torch.load still points to _patched_torch_load, then restore the original
afterwards.

In `@src/modules/Ultrastar/ultrastar_parser.py`:
- Around line 36-57: The parsing chain in ultrastar_parser.py contains a
duplicate VIDEO check and an ordering bug: remove the duplicate branch that
checks for UltrastarTxtTag.VIDEO (the second VIDEO elif) and move the
UltrastarTxtTag.VIDEOGAP check so it appears before the UltrastarTxtTag.VIDEO
check; update the code that assigns ultrastar_class.videoGap (and
ultrastar_class.video) accordingly so `#VIDEOGAP` lines match the VIDEOGAP branch
and `#VIDEO` lines match the VIDEO branch. Ensure you only keep one elif for
UltrastarTxtTag.VIDEO and that UltrastarTxtTag.VIDEOGAP is evaluated earlier in
the same if/elif chain where ultrastar_class is being populated.

🟡 Minor comments (12)

src/modules/ffmpeg_helper.py-94-97 (1)
94-97: ⚠️ Potential issue | 🟡 Minor

Docstring is inaccurate — function returns only the extension, not the codec name.

The docstring says "return codec name and appropriate file extension," and the function name includes codec, but the return value is solely the file extension string.
📝 Suggested docstring fix
 def get_audio_codec_and_extension(video_file_path: str) -> str:
     """
-    Detect audio codec from video file and return codec name and appropriate file extension.
+    Detect audio codec from video file and return the appropriate file extension.
     """
src/modules/Ultrastar/ultrastar_writer.py-56-56 (1)
56-56: ⚠️ Potential issue | 🟡 Minor

Trailing comma creates a discarded tuple expression.

Line 56 ends with , after the file.write(...) call. This makes the statement a tuple expression — the write still executes, but the trailing comma is almost certainly unintended.
-            file.write(f"#{UltrastarTxtTag.VERSION.value}:{ultrastar_class.version}\n"),
+            file.write(f"#{UltrastarTxtTag.VERSION.value}:{ultrastar_class.version}\n")
install/CPU/macos_cpu.sh-32-35 (1)
32-35: ⚠️ Potential issue | 🟡 Minor

py is not available on macOS — use python instead.

The py launcher is a Windows-only utility. On macOS, after activating the venv, the correct command is python:
Proposed fix
 echo "To run UltraSinger:"
 echo "  source .venv/bin/activate"
 echo "  cd src"
-echo "  py UltraSinger.py"
+echo "  python UltraSinger.py"
.github/workflows/pytest.yml-14-14 (1)
14-14: ⚠️ Potential issue | 🟡 Minor

Update GitHub Actions to current versions.

actions/checkout@v3 and actions/setup-python@v4 use deprecated Node.js 16 runners. Update to actions/checkout@v4 and actions/setup-python@v5 to avoid deprecation warnings and potential workflow failures.
Proposed fix
-      uses: actions/checkout@v3
+      uses: actions/checkout@v4
-      uses: actions/setup-python@v4
+      uses: actions/setup-python@v5
Also applies to: 17-17
run_on_linux.sh-5-5 (1)
5-5: ⚠️ Potential issue | 🟡 Minor

cd commands should handle failure.

If cd fails (e.g., directory doesn't exist), the script silently continues in the wrong directory, potentially running the wrong files or failing confusingly. Shellcheck SC2164.
Proposed fix
-cd "$SCRIPT_DIR"
+cd "$SCRIPT_DIR" || exit 1
-cd src
+cd src || { echo "Error: src directory not found"; exit 1; }
Also applies to: 20-20
README.md-284-287 (1)

284-287: ⚠️ Potential issue | 🟡 Minor

CUDA wheel index inconsistency: cu121 here vs cu128 in Colab notebook.

The manual installation example references cu121, but the Colab notebook (colab/UltraSinger.ipynb, line 49) uses cu128. These should be consistent, or at least the README should note which CUDA version to target. Consider aligning with the intended CUDA version for this release.
README.md-216-218 (1)
216-218: ⚠️ Potential issue | 🟡 Minor

Incorrect frequency value: 60 Hz vs 65 Hz in code.

Line 218 states "UltraSinger uses 60hz and 400hz", but pitcher.py (line 19) initializes SwiftF0 with fmin=65. This should be 65hz to match the implementation. Also, this line is missing punctuation and reads as a sentence fragment—consider rephrasing.
Proposed fix
-UltraSinger uses 60hz and 400hz
+UltraSinger uses 65 Hz and 400 Hz as the frequency range.
install/CPU/linux_cpu.sh-35-35 (1)
35-35: ⚠️ Potential issue | 🟡 Minor

py is not available on Linux — should be python.

py is the Windows Python Launcher and is not available on Linux. The CUDA install script correctly uses python. This instruction will confuse Linux users.
Proposed fix
-echo "  py UltraSinger.py"
+echo "  python UltraSinger.py"
Dockerfile-1-1 (1)

1-1: ⚠️ Potential issue | 🟡 Minor

CUDA version mismatch between base image and PyTorch wheels.

The base image uses CUDA 12.6.3, but PyTorch wheels are from the cu128 index (CUDA 12.8). The critical constraint is that the GPU driver (not the base image) must support CUDA 12.8 APIs (minimum ≥525.60.13 on Linux). While PyTorch bundles its own CUDA runtime, operations depending on the GPU driver could fail if the driver version doesn't meet the requirement. Either upgrade the base image to nvidia/cuda:12.8-runtime-ubuntu22.04 and ensure GPU driver compatibility, or switch to cu126 wheels to match the base image.
pyproject.toml-7-7 (1)
7-7: ⚠️ Potential issue | 🟡 Minor

Non-canonical PEP 440 version string.

"0.0.13-dev14" uses a hyphen; the canonical PEP 440 form is "0.0.13.dev14" (with a dot). Build tools will normalize it, but using the canonical form avoids ambiguity.
Suggested fix
-version = "0.0.13-dev14"
+version = "0.0.13.dev14"
pyproject.toml-37-37 (1)
37-37: ⚠️ Potential issue | 🟡 Minor

dataclasses is a stdlib module since Python 3.7; redundant with requires-python = ">=3.12".

The dataclasses PyPI package is a backport for Python 3.6. It's a no-op on 3.7+ and can be removed.
Suggested fix
-    "dataclasses",
pyproject.toml-11-11 (1)

11-11: ⚠️ Potential issue | 🟡 Minor

Remove setuptools from runtime dependencies.

The codebase does not use setuptools or pkg_resources at runtime. It's already correctly listed in [build-system] requires, so having it in [project] dependencies is unnecessary bloat.

🧹 Nitpick comments (20)

src/modules/Ultrastar/ultrastar_parser.py (1)

36-57: .value is redundant on str Enums (but harmless).

Since UltrastarTxtTag inherits from str, the enum member is the string — UltrastarTxtTag.ARTIST == "ARTIST" is already True. Adding .value is not wrong, but it's unnecessary boilerplate. Consistency across the codebase is the main benefit here, so this is fine to keep if that's the project convention.
src/modules/Ultrastar/ultrastar_writer.py (1)
143-144: Use f-string conversion flag instead of wrapping in str().

Per the Ruff RUF010 hint, prefer the !s conversion flag:
-                linebreak = f"{UltrastarTxtTag.LINEBREAK.value} " \
-                            f"{str(round(show_next))}\n"
+                linebreak = f"{UltrastarTxtTag.LINEBREAK.value} " \
+                            f"{round(show_next)!s}\n"
src/modules/Audio/key_detector.py (1)
97-97: Noisy per-note log in a hot path — consider removing or gating behind a verbose/debug flag.

quantize_note_to_key is called for every note in a song, so this print will flood the console. The commented-out print on line 147 suggests per-note logging was previously deemed too verbose. Additionally, the message says "with allowed notes" but doesn't actually print the allowed_notes set, limiting its diagnostic value.
Suggested fix

Either remove it or gate it behind a debug/verbose flag, and include allowed_notes if it's meant for diagnostics:
-    print(f"{ULTRASINGER_HEAD} Quantizing note {blue_highlighted(note)} to key with allowed notes")
+    # Uncomment for debugging:
+    # print(f"{ULTRASINGER_HEAD} Quantizing note {blue_highlighted(note)} to key with allowed notes: {allowed_notes}")
install/CUDA/windows_cuda_gpu.bat (1)

15-19: Remove commented-out dead code.

This commented-out venv removal block adds clutter. If it's not needed, remove it; if it may be needed later, track it in an issue instead.
install/CPU/windows_cpu.bat (2)
85-94: CPU PyTorch install doesn't pin versions, unlike the CUDA script.

The CUDA variant (windows_cuda_gpu.bat line 90) pins torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0, while this CPU script installs unpinned versions. This inconsistency can lead to version mismatches or unexpected breakage if a new PyTorch release is incompatible with the project's dependencies.
Proposed fix — pin CPU PyTorch versions to match
 :: Install PyTorch CPU version
-uv pip install --index-url https://download.pytorch.org/whl/cpu torch torchvision torchaudio
+uv pip install --index-url https://download.pytorch.org/whl/cpu torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0
1-97: Significant duplication with windows_cuda_gpu.bat.

Lines 1–83 are nearly identical to install/CUDA/windows_cuda_gpu.bat. Consider extracting the shared Python-detection and uv-bootstrap logic into a common helper script (e.g., install/common_setup.bat) and calling it from both scripts to reduce maintenance burden.
src/modules/Pitcher/pitcher.py (1)

10-20: Global mutable state — not thread-safe.

The lazy-initialized _swift_f0_detector global with no synchronization is fine for single-threaded use, but worth noting. If this module is ever used concurrently, a race condition could create multiple detector instances. For now this is acceptable given the application's single-threaded nature.

ReleaseNotes.md (1)

17-19: Minor capitalization inconsistency.

Line 19 starts with lowercase "upgrade" while all other entries in this section use a capitalized first word (e.g., "Changed", "Drop"). Consider: Upgrade to Python 3.12.
run_on_linux.sh (1)
22-22: Command-line arguments are not forwarded to UltraSinger.

Users would expect to pass arguments like -i "song.mp3" via this script. Currently no arguments are forwarded.
Proposed fix
-python UltraSinger.py
+python UltraSinger.py "$@"
pytest/modules/Pitcher/test_pitcher.py (2)

12-12: Stale test method name.

The method is still named test_get_pitch_with_crepe_file but now tests SwiftF0 via get_pitch_with_file. Consider renaming to test_get_pitch_with_file for clarity.

22-22: Remove commented-out CREPE call.

This dead code is no longer relevant after the migration to SwiftF0.
run_on_windows.bat (1)
1-5: Behavioral inconsistency with Linux script and missing .venv check.

Two concerns:

Unlike run_on_linux.sh, there's no check for .venv existence. If it's missing, activate.bat will silently fail.

The Linux script auto-runs python UltraSinger.py, but this script opens an interactive shell (cmd /k) requiring the user to type the command manually. If this is intentional, consider adding a note. Otherwise, consider aligning behavior:
Proposed fix
 `@echo` off
-:: Activate the virtual environment and open cmd in the src directory
-call .venv\Scripts\activate.bat
-cd src
-cmd /k
+:: Activate the virtual environment and run UltraSinger
+if not exist ".venv\Scripts\activate.bat" (
+    echo Error: Virtual environment not found at .venv
+    echo Please run one of the installation scripts first.
+    pause
+    exit /b 1
+)
+call .venv\Scripts\activate.bat
+cd src
+echo Starting UltraSinger...
+python UltraSinger.py %*
+pause
colab/UltraSinger.ipynb (1)

33-33: Colab clones the default branch — may not reflect latest changes.

The notebook always clones from main. Users testing unreleased changes (e.g., this PR's branch) would need to manually edit the clone command. Consider adding a comment noting how to switch branches.
run_on_mac.command (1)
2-21: Consider adding set -e or guarding the cd src call.

If the src directory doesn't exist, cd src will fail silently and python UltraSinger.py will execute from the wrong directory (project root). The other install scripts in this PR use set -e. Also, consider forwarding arguments with "$@" to python UltraSinger.py for consistency with typical launchers.
Proposed improvement
 #!/bin/bash
+set -e
 # Activate virtual environment and run UltraSinger on macOS

 SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
 cd "$SCRIPT_DIR"
 ...
 cd src
 echo "Starting UltraSinger..."
-python UltraSinger.py
+python UltraSinger.py "$@"
Dockerfile (1)
8-14: Add --no-install-recommends to reduce image size.

As flagged by static analysis, the apt-get install calls are missing --no-install-recommends, which pulls in unnecessary recommended packages and bloats the image.
Proposed fix
 RUN apt-get update \
-    && apt-get install -y software-properties-common \
+    && apt-get install -y --no-install-recommends software-properties-common \
     && add-apt-repository ppa:deadsnakes/ppa -y \
     && apt-get update \
-    && apt-get install -y git python3.12 python3.12-venv python3.12-dev ffmpeg curl tzdata \
+    && apt-get install -y --no-install-recommends git python3.12 python3.12-venv python3.12-dev ffmpeg curl tzdata \
     && apt-get clean  \
     && rm -rf /var/lib/apt/lists/*
src/modules/DeviceDetection/device_detection.py (1)
7-7: Module-level pytorch_gpu_supported is unused dead code.

This global is shadowed by local variables in both check_gpu_support and __check_pytorch_support — neither function uses the global keyword to read or write it.
Proposed fix
-pytorch_gpu_supported = False
-
 def check_gpu_support() -> str:
src/UltraSinger.py (1)

790-793: Dead CLI options: --crepe, --crepe_step_size, --force_crepe_cpu are still parsed but no longer used.

Since CREPE has been replaced by SwiftF0, these options and their corresponding settings fields are dead code. They can be cleaned up to avoid confusing users.

Also applies to: 813-814
pyproject.toml (3)
2-3: Unnecessary build dependencies alongside hatchling.

Since you've switched to hatchling.build as the backend and the version is now statically defined, setuptools, wheel, and setuptools_scm in requires are unnecessary. They add build-time overhead without being used.
Suggested fix
-requires = ["hatchling", "setuptools", "wheel", "setuptools_scm"]
+requires = ["hatchling"]
44-48: Empty platform optional-dependency groups.

The windows, linux, and macos groups are empty placeholders. If there are no platform-specific dependencies planned, consider removing them to avoid confusion. If platform-specific PyTorch index URLs are needed (e.g., CUDA on Linux/Windows vs. CPU on macOS), this is where they'd go.

48-51: Duplicate dev group in both [project.optional-dependencies] and [dependency-groups].

pytest appears in both [project.optional-dependencies] dev and [dependency-groups] dev. Pick one mechanism. If using uv, [dependency-groups] (PEP 735) is the idiomatic choice.

.github/workflows/main.yml

install/fix_cuda11_libs.bat

install/fix_cudnn8.bat

pyproject.toml

src/modules/ffmpeg_helper.py

src/modules/Pitcher/pitcher.py

src/modules/Speech_Recognition/Whisper.py

src/modules/Ultrastar/ultrastar_parser.py

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@Dockerfile`:
- Line 1: The Dockerfile's base image FROM
nvidia/cuda:12.6.3-runtime-ubuntu22.04 conflicts with the PyTorch wheels built
for cu128 (CUDA 12.8); update the FROM line to a CUDA 12.8 runtime image (for
example nvidia/cuda:12.8.1-runtime-ubuntu22.04) to align runtimes, or
alternatively install PyTorch wheels built for cu126 to match the existing FROM
image—modify the Dockerfile's FROM directive accordingly so the container CUDA
runtime and chosen PyTorch wheel target match.
- Around line 21-29: The Dockerfile performs an editable PEP 660 install with
"RUN uv pip install --system --python 3.12 -e ." while only copying
pyproject.toml and creating an empty src/ directory; because pyproject.toml uses
hatchling with packages = ["src"], hatchling will fail to discover packages
without a package skeleton. Fix by either copying a minimal package skeleton
(e.g., add a src/__init__.py) into the image before the editable install or
change the build so you first install deps (pip install from pyproject) and then
run the editable install (uv pip install -e .) after copying the full source
tree; adjust the steps around COPY ./pyproject.toml, RUN mkdir -p
/app/UltraSinger/src, and the RUN uv pip install... line accordingly.

🧹 Nitpick comments (3)

Dockerfile (3)
8-14: Add --no-install-recommends to reduce image size.

Both apt-get install invocations pull in recommended packages that are unnecessary in a container. This can significantly bloat the image.
♻️ Proposed fix
 RUN apt-get update \
-    && apt-get install -y software-properties-common \
+    && apt-get install -y --no-install-recommends software-properties-common \
     && add-apt-repository ppa:deadsnakes/ppa -y \
     && apt-get update \
-    && apt-get install -y git python3.12 python3.12-venv python3.12-dev ffmpeg curl tzdata \
+    && apt-get install -y --no-install-recommends git python3.12 python3.12-venv python3.12-dev ffmpeg curl tzdata \
     && apt-get clean  \
     && rm -rf /var/lib/apt/lists/*
16-18: Consider pinning the uv version for reproducible builds.

The install script fetches the latest uv version. A future breaking change could silently break the Docker build. You can pin a version via the UV_INSTALLER_VERSION env var or by appending a version to the URL (e.g., https://astral.sh/uv/0.6.x/install.sh).

34-40: Combine COPY and chown to save a layer and reduce build time.

chown -R on line 39 creates a duplicate layer of the entire source tree. Using COPY --chown combines both operations into a single layer.
♻️ Proposed fix
 # copy sources late to allow for caching of layers which contain all the dependencies
-COPY . /app/UltraSinger
-
-
-# no need to run as root
-RUN chown -R 1000:1000 /app/UltraSinger
+COPY --chown=1000:1000 . /app/UltraSinger
+
 USER 1000:1000

Dockerfile

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/modules/Speech_Recognition/Whisper.py (1)

163-196: 🛠️ Refactor suggestion | 🟠 Major

Use a finally block instead of duplicating torch.load restoration in every handler.

The restoration is repeated four times (lines 165, 170, 184, 191). More importantly, BaseException subclasses like KeyboardInterrupt or SystemExit will bypass all except blocks and leave the patch active.

Suggested refactor with finally

     try:
         torch.cuda.empty_cache()
         loaded_whisper_model = whisperx.load_model(
             ...
         )
         ...
         result_aligned = whisperx.align(...)
         transcribed_data = convert_to_transcribed_data(result_aligned)
-        # Restore original torch.load after models are loaded
-        torch.load = _original_torch_load
         return TranscriptionResult(transcribed_data, detected_language)
     except ValueError as value_error:
-        torch.load = _original_torch_load
         ...
         raise value_error
     except OutOfMemoryError as oom_exception:
-        torch.load = _original_torch_load
         ...
         raise oom_exception
     except Exception as exception:
-        torch.load = _original_torch_load
         ...
         raise exception
+    finally:
+        torch.load = _original_torch_load

🤖 Fix all issues with AI agents

In `@pyproject.toml`:
- Line 7: Update the version string in pyproject.toml to use canonical PEP 440
formatting: replace version = "0.0.13-dev14" with version = "0.0.13.dev14" so
the project version uses the dot-separated dev spec per PEP 440 normalization.

In `@src/modules/Speech_Recognition/Whisper.py`:
- Around line 12-22: The current module-level monkey-patch
(_original_torch_load, _patched_torch_load, torch.load = _patched_torch_load) is
correctly applied before importing whisperx but the restoration of torch.load is
duplicated inside multiple except handlers; change this to restore the original
loader in a single finally block that always executes—remove the torch.load =
_original_torch_load lines from the except handlers and add a finally:
torch.load = _original_torch_load after the try/except chain that surrounds
whisperx.load_model and the transcription logic (the block that returns
TranscriptionResult or re-raises ValueError / OutOfMemoryError / Exception).

In `@src/modules/Ultrastar/coverter/ultrastar_txt_converter.py`:
- Line 73: The ultrastar_txt.video assignment blindly concatenates basename and
media_info.video_extension which can be None; update the logic in
ultrastar_txt_converter.py to only set ultrastar_txt.video when
media_info.video_extension is truthy (non-None/non-empty) — otherwise leave
ultrastar_txt.video unset or None so the serializer omits the `#VIDEO` tag;
reference ultrastar_txt.video and media_info.video_extension to locate and
change the assignment.
- Around line 69-72: from_ultrastar_txt creates a MediaInfo without
audio_extension which results in filenames like "basename.None"; when
constructing the MediaInfo in from_ultrastar_txt set a sensible default (e.g.,
"mp3") for audio_extension (if audio_extension is None or falsy) so that
ultrastar_txt.mp3 / ultrastar_txt.audio / ultrastar_txt.vocals /
ultrastar_txt.instrumental produce valid filenames; update the MediaInfo
creation in from_ultrastar_txt (and any other code paths that create MediaInfo
from Ultrastar text, e.g., the export paths used by UltraSinger) to assign
audio_extension = "mp3" when none is provided.

In `@src/modules/Ultrastar/ultrastar_writer.py`:
- Line 56: The line calling file.write currently ends with a trailing comma
which turns the call into a discarded tuple (e.g., file.write(...),); remove the
trailing comma so the write call is executed normally. Update the occurrence
that uses UltrastarTxtTag.VERSION.value and ultrastar_class.version in
ultrastar_writer.py (the file.write(...) invocation) to drop the comma and
ensure the string is written to the file.

In `@src/UltraSinger.py`:
- Around line 670-673: The audio extension variable currently includes a leading
dot causing later code that does basename + "." + audio_extension to produce a
double-dot filename; update the block that sets
basename_with_ext/audio_ext/video_ext so audio_ext is stored without a leading
dot (use extension.lstrip('.')) and build basename_with_ext consistently (e.g.,
f"{basename_without_ext}.{audio_ext}") so downstream code (see
separate_audio_video usage and ultrastar_txt_converter.py filename construction)
produces correct single-dot filenames.
- Around line 520-521: When building output paths and calling
convert_audio_format, guard against process_data.media_info.audio_extension
being None: in the code that computes karaoke_output_path and the other output
path uses (symbols: karaoke_output_path, process_data.process_data_paths,
convert_audio_format, and media_info.audio_extension), check if
process_data.media_info.audio_extension is None and either (a) use a sensible
fallback extension (e.g., "mp3" or a constant DEFAULT_AUDIO_EXTENSION) before
concatenating, or (b) skip creating/ converting those files and log/raise a
clear error; apply the same guard wherever media_info.audio_extension is
concatenated for output paths so no TypeError occurs.

🧹 Nitpick comments (10)

.github/workflows/pylint.yml (1)

14-14: Python version inconsistency with other workflows.

The rest of the PR standardizes on Python 3.12 (pytest.yml, main.yml, Dockerfile, pyproject.toml), but this workflow still uses 3.10. While the workflow is currently disabled (triggers on none.txt), it would be good to align it now to avoid confusion if it's ever re-enabled.
.github/workflows/pytest.yml (1)
21-27: Consider pinning the uv version for CI reproducibility.

curl -LsSf https://astral.sh/uv/install.sh | sh installs the latest uv. A breaking change in a future uv release could silently break CI. You can pin a version, e.g.:
curl -LsSf https://astral.sh/uv/0.6.x/install.sh | sh
(Same concern applies to the identical pattern in main.yml.)
Dockerfile (1)
8-15: Add --no-install-recommends to reduce image size.

The apt-get install calls pull in recommended packages that aren't needed in a container, bloating the image unnecessarily.
Proposed fix
 RUN apt-get update \
-    && apt-get install -y software-properties-common \
+    && apt-get install -y --no-install-recommends software-properties-common \
     && add-apt-repository ppa:deadsnakes/ppa -y \
     && apt-get update \
-    && apt-get install -y git python3.12 python3.12-venv python3.12-dev ffmpeg curl tzdata \
-       build-essential gcc g++ make \
+    && apt-get install -y --no-install-recommends git python3.12 python3.12-venv python3.12-dev ffmpeg curl tzdata \
+       build-essential gcc g++ make ca-certificates \
src/modules/ffmpeg_helper.py (1)
94-97: Docstring is misleading — function returns only the extension, not the codec name.

The docstring says "return codec name and appropriate file extension" but the function only returns the extension string.
Proposed fix
 def get_audio_codec_and_extension(video_file_path: str) -> str:
     """
-    Detect audio codec from video file and return codec name and appropriate file extension.
+    Detect audio codec from video file and return the appropriate file extension.
     """
src/modules/Audio/youtube.py (1)
107-109: final_video_path is unpacked but never used — prefix with underscore.

Per Ruff RUF059, final_video_path is assigned but unused. Prefix it to signal intentional discard.
Proposed fix
-    audio_file_path, final_video_path, audio_ext, video_ext = separate_audio_video(
+    audio_file_path, _final_video_path, audio_ext, video_ext = separate_audio_video(
pyproject.toml (2)
34-34: dataclasses dependency is unnecessary for Python ≥ 3.12.

The dataclasses module has been part of the standard library since Python 3.7. The PyPI dataclasses package is a backport for 3.6 only. Since requires-python = ">=3.12", this dependency is dead weight.
     "tqdm",
     "yt-dlp",
     "music21",
-    "dataclasses",
     "dataclasses-json",
41-48: Duplicate dev dependency definitions.

dev is declared in both [project.optional-dependencies] (line 45) and [dependency-groups] (line 48) with identical content. Having both is confusing—pick one. [dependency-groups] is the newer PEP 735 mechanism; [project.optional-dependencies] is more widely supported.
src/UltraSinger.py (2)
666-666: Unused variable final_video_path.

Static analysis flags final_video_path as unused. Prefix with underscore to signal intent.
-        ultrastar_audio_input_path, final_video_path, audio_ext, video_ext = separate_audio_video(
+        ultrastar_audio_input_path, _final_video_path, audio_ext, video_ext = separate_audio_video(
793-797: Dead code: --crepe and --crepe_step_size CLI options are still parsed but CREPE has been removed.

These options set settings.crepe_model_capacity and settings.crepe_step_size, but the pitch path now uses SwiftF0 exclusively. The options in arg_options() (lines 868-869) and the corresponding --force_crepe_cpu handling (lines 817-818, 885) should also be cleaned up.
src/modules/Ultrastar/ultrastar_writer.py (1)

86-91: #TAGS is written after #GAP — verify this matches the UltraStar format spec ordering expectations.

The TAGS field is written between GAP and CREATOR. While the UltraStar format may not enforce strict tag ordering, it's worth confirming this placement is intentional and consistent with other tools' expectations.

pyproject.toml

src/modules/Speech_Recognition/Whisper.py

src/modules/Ultrastar/coverter/ultrastar_txt_converter.py

src/modules/Ultrastar/ultrastar_writer.py

src/UltraSinger.py

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/UltraSinger.py (1)
793-798: ⚠️ Potential issue | 🟡 Minor

Dead CLI options: --crepe, --crepe_step_size, --force_crepe_cpu still accepted.

These flags set attributes that were removed from Settings. They silently do nothing now. Remove the option handling and the arg_options entries to avoid confusing users.
Suggested fix for arg_options
     long = [
         "ifile=",
         "ofile=",
-        "crepe=",
-        "crepe_step_size=",
         "demucs=",
         ...
-        "force_crepe_cpu",
         "format_version=",
Also applies to: 817-818, 868-869, 885-885
src/Settings.py (1)

50-53: ⚠️ Potential issue | 🟡 Minor

Remove dead CREPE CLI option handling and clean up related code.

The crepe_model_capacity, crepe_step_size, and force_crepe_cpu attributes were removed from Settings, but dead references remain across multiple files:

UltraSinger.py (lines 794–798, 817–818): CLI flags that set removed attributes

UltraSinger.py (lines 868–869, 885): Dead entries in arg_options()

init_interactive_mode.py (line 127): Sets force_crepe_cpu attribute dynamically

common_print.py (lines 46–48, 69): Outdated help text documenting CREPE flags

Since Settings is non-frozen, these assignments silently create dynamic attributes that go unused, misleading users. Remove all CREPE flag handling and update documentation.

🤖 Fix all issues with AI agents

In `@pyproject.toml`:
- Around line 2-3: Remove the unnecessary build-time dependencies from
pyproject.toml by editing the requires array used with the build-backend
"hatchling.build": remove "setuptools" and "setuptools_scm" from the requires
list since hatchling is the specified build-backend and the project uses a
static version (no tool.setuptools_scm config), while "setuptools" remains
declared as a runtime dependency under [project] dependencies.

In `@src/UltraSinger.py`:
- Around line 707-714: The pitch cache branch is using the wrong flag: replace
the check of settings.skip_cache_transcription with the dedicated
settings.skip_cache_pitch_detection so the code that decides to call
get_pitch_with_file(process_data_paths.processing_audio_path) uses "if
settings.skip_cache_pitch_detection or not cache_available" (leave
pitching_config, pitched_data_path and check_file_exists as-is); this aligns the
skip behavior with the Settings flag that controls pitch detection caching.

🧹 Nitpick comments (5)

src/modules/Speech_Recognition/Whisper.py (1)
164-196: Redundant torch.load restoration in each except block — the finally block handles it.

The finally block at line 193 guarantees restoration regardless of how the try exits, so the three identical torch.load = _original_torch_load assignments in the except handlers (lines 166, 180, 187) are dead code.
Proposed cleanup
     except ValueError as value_error:
-        # Restore original torch.load in case of error
-        torch.load = _original_torch_load
-
         if (
             "Requested float16 compute type, but the target device or backend do not support efficient float16 computation."
             in str(value_error.args[0])
         ):
             print(value_error)
             print(
                 f"{ULTRASINGER_HEAD} Your GPU does not support efficient float16 computation; run UltraSinger with '--whisper_compute_type int8'"
             )
-
         raise value_error
     except OutOfMemoryError as oom_exception:
-        # Restore original torch.load in case of error
-        torch.load = _original_torch_load
-
         print(oom_exception)
         print(MEMORY_ERROR_MESSAGE)
         raise oom_exception
     except Exception as exception:
-        # Restore original torch.load in case of error
-        torch.load = _original_torch_load
-
         if "CUDA failed with error out of memory" in str(exception.args[0]):
             print(exception)
             print(MEMORY_ERROR_MESSAGE)
         raise exception
     finally:
         # Restore original torch.load after models are loaded
         torch.load = _original_torch_load
src/modules/Ultrastar/coverter/ultrastar_txt_converter.py (1)

95-96: Clarify or remove the TODO.

The # todo: as list add here? is vague. If multi-key support is planned, consider opening an issue to track it rather than leaving an ambiguous TODO.
pyproject.toml (2)
34-34: dataclasses is unnecessary for Python ≥ 3.12.

The dataclasses module has been part of the standard library since Python 3.7. The PyPI package is a backport for 3.6 only. Since requires-python = ">=3.12", this dependency is dead weight and can be removed.
Suggested fix
     "tqdm",
     "yt-dlp",
     "music21",
-    "dataclasses",
     "dataclasses-json",
42-44: Empty platform optional-dependency groups serve no purpose.

windows = [], linux = [], macos = [] are placeholders with no dependencies. If platform-specific dependencies are planned, consider adding a TODO or removing these until needed.
src/UltraSinger.py (1)
666-666: Unused variable final_video_path.

final_video_path from separate_audio_video is unpacked but never used. Use _ to signal intent.
Suggested fix
-        ultrastar_audio_input_path, final_video_path, audio_ext, video_ext = separate_audio_video(
+        ultrastar_audio_input_path, _, audio_ext, video_ext = separate_audio_video(

pyproject.toml

src/UltraSinger.py

rakuri255 added 12 commits February 14, 2026 14:32

fix installer with uv and swiftF0

300a30a

colab fix

5b2287b

colab fix

13a5b83

try fix docker

bf09406

use uv in ci

e00a76f

try fix docker build

d03c49b

add missing uv sync in linux cuda installer

281620a

try fix docker

e8098aa

try fix docker

a93df07

try fix docker

d450dee

try fix docker

9097915

try fix docker

45a62af

coderabbitai bot reviewed Feb 14, 2026

View reviewed changes

try fix docker

19122c1

This was linked to issues Feb 14, 2026

Running script throws a 'bash: line 1: .venvScriptsactivate: command not found' error #296

Closed

Torch error in fbgemm.dll - version 0.0.12 / Windows / CPU #168

Closed

coderabbitai bot reviewed Feb 14, 2026

View reviewed changes

Dockerfile Outdated Show resolved Hide resolved

Dockerfile Show resolved Hide resolved

rakuri255 added 2 commits February 15, 2026 00:57

try fix docker

0c885f0

try fix docker

1fd9bc6

rakuri255 self-assigned this Feb 15, 2026

fix output

b900022

coderabbitai bot reviewed Feb 15, 2026

View reviewed changes

rakuri255 added 2 commits February 15, 2026 02:21

fix test

aa53b15

fix video extension is none

c260a8c

coderabbitai bot reviewed Feb 15, 2026

View reviewed changes

pyproject.toml Show resolved Hide resolved

src/UltraSinger.py Show resolved Hide resolved

rakuri255 added 3 commits February 15, 2026 11:58

add audio extention when using txt

0240166

fix: update cache skip setting and add audio extension validation

51be139

fix test

e46ffd8

rakuri255 merged commit efba4eb into main Feb 15, 2026
9 checks passed

This was referenced Feb 15, 2026

The docker image does not work with modern CUDA builds (which prevents new cards from working altogether) #273

Closed

ctranslate import error on modern Linux #253

Closed

Installing error on M1 Mac #79

Closed

Uh oh!

Conversation

rakuri255 commented Feb 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rakuri255 commented Feb 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 14, 2026 •

edited

Loading