Skip to content

Fix installer with uv and swift f0 pitcher#299

Merged
rakuri255 merged 21 commits intomainfrom
fix-installer-with-uv-and-swiftF0-pitcher
Feb 15, 2026
Merged

Fix installer with uv and swift f0 pitcher#299
rakuri255 merged 21 commits intomainfrom
fix-installer-with-uv-and-swiftF0-pitcher

Conversation

@rakuri255
Copy link
Owner

@rakuri255 rakuri255 commented Feb 14, 2026

Summary by CodeRabbit

  • New Features

    • Pitch detection now uses SwiftF0 for improved accuracy and simpler workflow.
    • Output files preserve original audio/video extensions and codec-aware extraction.
  • Bug Fixes

    • Safer scoring logic to avoid divide-by-zero; more consistent tag parsing/writing.
  • Chores

    • Upgraded to Python 3.12 and unified cross-platform dependency/install flow around a single installer tool.
    • Container updated for CUDA-enabled PyTorch (cu128); installer scripts refreshed per platform.
    • Added Windows helpers to ensure CUDA/cuDNN runtime DLLs are available.
  • Documentation

    • Installation and usage docs revised for new Python, GPU guidance, and script names.
  • Tests

    • Adjusted tests to align with updated pitch API and tag string handling.

@coderabbitai
Copy link

coderabbitai bot commented Feb 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Dependency management moved to uv/pyproject.toml and Python bumped to 3.12; CREPE pitch detection replaced with SwiftF0; TensorFlow GPU checks removed in favor of PyTorch/CUDA (cu128); media audio/video extensions are now propagated through processing and output generation.

Changes

Cohort / File(s) Summary
CI/CD Workflows
.github/workflows/main.yml, .github/workflows/pytest.yml, .github/workflows/build-container-image.yml, .github/workflows/pylint.yml
Bumped Python to 3.12, upgraded actions versions, replaced per-OS pip installs with uv installer + uv sync, unified test execution via uv run pytest.
Build System & Packaging
pyproject.toml
Switched build backend to hatchling, added explicit version/description, raised requires-python to >=3.12, moved dependencies into pyproject.toml with optional/dependency groups.
Removed legacy pin files
requirements.in, requirements-linux.txt, requirements-macos.txt, requirements-windows.txt
Deleted pip-compile input and generated pinned requirement manifests; dependency management consolidated into pyproject.toml/uv.
Container / Dockerfile
Dockerfile
Base CUDA image bumped; installs system Python 3.12, installs uv, uses pyproject/uv to install deps, and installs CUDA-enabled PyTorch (cu128); source copy reordered for caching.
Installer scripts (CPU & CUDA)
install/CPU/..., install/CUDA/... (linux_cpu.sh, macos_cpu.sh, windows_cpu.bat, linux_cuda_gpu.sh, windows_cuda_gpu.bat)
Replaced venv+pip flows with uv-driven sync, added uv install/verification and PATH updates, Windows scripts detect/use Python 3.12, PyTorch installs set to cu128 where applicable.
CUDA helper scripts
install/fix_cuda11_libs.bat, install/fix_cudnn8.bat
Added Windows scripts to fetch/install cuBLAS/cuDNN DLLs into a venv's torch lib with validation and cleanup.
Runtime launch scripts
run_on_linux.sh, run_on_mac.command, run_on_windows.bat
Made startup scripts robust: resolve script dir, check/activate .venv, and run UltraSinger.py with clearer error messaging.
Pitch detection (feature)
src/modules/Pitcher/pitcher.py, pytest/modules/Pitcher/test_pitcher.py
Replaced CREPE with SwiftF0, added lazy detector and get_pitch_with_swift_f0, consolidated file-based API get_pitch_with_file, removed CREPE APIs and updated tests to use simplified call.
Core app & process data
src/UltraSinger.py, src/Settings.py, src/modules/ProcessData.py
Bumped APP_VERSION, removed CREPE/TensorFlow-specific settings, updated pitch call sites, added audio_extension/video_extension to MediaInfo, and fixed mutable default via default_factory.
Device detection
src/modules/DeviceDetection/device_detection.py
Removed TensorFlow GPU checks; check_gpu_support() now returns 'cuda' or 'cpu' based solely on PyTorch.
Speech recognition
src/modules/Speech_Recognition/Whisper.py
Added scoped monkey-patch of torch.load to force weights_only=False during model loads and restored original loader in error/finally paths; removed previous detect_language monkey-patch.
Ultrastar parsing/writing/score & tests
src/modules/Ultrastar/..., pytest/modules/UltraStar/test_ultrastar_writer.py
Switched enum usages to .value for tags and note types across parser, writer, score calc and tests; added guards to avoid zero-division in scoring.
Audio/FFmpeg helper & conversion
src/modules/ffmpeg_helper.py, src/modules/Audio/convert_audio.py, src/modules/Audio/youtube.py
Added get_audio_codec_and_extension() using ffprobe; separate_audio_video() returns audio/video extensions; callers propagate extensions; replaced pydub WAV→MP3 with ffmpeg-based convert_audio_format.
Misc small changes
src/modules/Audio/key_detector.py, src/modules/Midi/midi_creator.py, src/modules/Ultrastar/coverter/ultrastar_txt_converter.py
Removed unused imports, moved an import to module scope, and updated Ultrastar TXT generation to use dynamic media extensions from MediaInfo.
Docs / Notebook / Release
README.md, ReleaseNotes.md, colab/UltraSinger.ipynb
Updated docs and notebook for Python 3.12, uv-based installs, SwiftF0 default, CUDA guidance, and release notes added entries for uv, SwiftF0, and Python upgrade.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant US as UltraSinger
    participant FS as Filesystem
    participant PK as Pitcher.get_pitch_with_file
    participant SF as SwiftF0 Detector

    US->>FS: ensure processing_audio_path exists
    FS-->>US: file path
    US->>PK: get_pitch_with_file(processing_audio_path)
    PK->>FS: read audio file bytes
    FS-->>PK: audio bytes
    PK->>SF: detect(audio, sample_rate)
    SF-->>PK: pitch frames + confidences
    PK-->>US: return PitchedData
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

A rabbit hops through code with glee,
From pip to uv and Python 3.12 free.
CREPE was traded for SwiftF0's song,
CUDA boots up and the scripts run strong. 🐇✨

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.47% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly matches the main changes: migrating the installer from pip to uv and replacing CREPE pitch detection with SwiftF0.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-installer-with-uv-and-swiftF0-pitcher

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ReleaseNotes.md (1)

1-1: ⚠️ Potential issue | 🟡 Minor

Version mismatch: ReleaseNotes.md declares 0.0.13dev13 but pyproject.toml declares 0.0.13-dev14.

Update the version header in ReleaseNotes.md to match pyproject.toml, or align pyproject.toml to the release notes version.

🤖 Fix all issues with AI agents
In @.github/workflows/main.yml:
- Around line 30-33: The workflow uses deprecated action versions; update the
GitHub Actions steps so the checkout and Python setup actions use newer major
releases by changing uses: actions/checkout@v3 to actions/checkout@v4 and uses:
actions/setup-python@v4 to actions/setup-python@v5 (ensure any inputs/outputs
remain compatible with the new action versions and run a workflow test).

In `@install/fix_cuda11_libs.bat`:
- Around line 1-100: The batch script install/fix_cuda11_libs.bat currently has
LF-only endings which can break Windows cmd parsing; convert this file to CRLF
line endings (save/commit with CRLF) and add a repository .gitattributes entry
to enforce it (e.g. add a line "*.bat text eol=crlf" to .gitattributes) so
future commits keep CRLF; ensure the committed install/fix_cuda11_libs.bat is
verified in git as CRLF before merging.

In `@install/fix_cudnn8.bat`:
- Around line 41-84: The fallback to search for cudnn64_8.dll in the pip cache
(the PowerShell block that sets $found and prints paths) never wires the found
paths into the later copy logic, so when the download fails the script still
expects cudnn.zip and the extracted archive; fix by capturing the actual full
path(s) returned by that PowerShell search and then copy those DLLs into
"%VIRTUAL_ENV%\Lib\site-packages\torch\lib\" (the same target used by the
existing copy command) instead of relying on cudnn.zip; implement this by
changing the PowerShell call to output the found file paths to an environment
variable or to a temp file and add a conditional branch (or jump label) that,
when the search succeeds (%errorlevel% == 0), performs copy /Y
"<found_path_or_paths>" "%VIRTUAL_ENV%\Lib\site-packages\torch\lib\" and lists
them, otherwise keep the existing manual-download error path.

In `@pyproject.toml`:
- Around line 32-35: The listed development tools ("isort", "black", "pylint",
"pytest") are incorrectly placed in main dependencies; remove those entries from
the project dependencies block and add them to the dev group (either under
[project.optional-dependencies] with a "dev" key or into
[project.dependency-groups] -> dev) so they are installed only for development.
Ensure the names "isort", "black", "pylint", and "pytest" appear under the dev
group and update any existing dev group entries to include them rather than
keeping them in the top-level dependencies.

In `@src/modules/ffmpeg_helper.py`:
- Line 130: The default extension used when looking up codec_to_extension
contains a leading dot, causing filenames like "file..wav"; change the lookup so
the default has no leading dot (e.g., extension =
codec_to_extension.get(codec_name, "wav")) so that the caller building
f"{basename_without_ext}.{audio_ext}" produces a single dot; ensure
codec_to_extension values remain dot-less and that the extension variable (and
any audio_ext usage) expects a bare extension string.

In `@src/modules/Pitcher/pitcher.py`:
- Around line 37-39: The current normalization assumes 16-bit PCM; update the
normalization logic around the variable "audio" in
src/modules/Pitcher/pitcher.py so it handles common bit depths and unsigned
formats: detect dtype (np.uint8, signed integers like np.int16/np.int32, and
floats) and convert to float32 with correct scaling—for uint8 subtract 128 and
divide by 128, for signed integers use np.iinfo(dtype) to compute the max
absolute value and divide by that (e.g., int16 -> 32768, int32 -> 2**31 or use
max(abs(min), max)), and for float32/float64 leave as-is (or cast to float32);
implement this in place of the single hardcoded audio.astype(...)/(2**15) line
so all common WAV bit depths normalize correctly.
- Around line 16-19: The SwiftF0 instantiation currently uses speech-only bounds
(SwiftF0(fmin=65, fmax=400)) which will misclassify high singing pitches; update
the _swift_f0_detector initialization to use a music-appropriate pitch range
(either the documented full range SwiftF0(fmin=46.875, fmax=2093.75) or a
narrower singing-focused range like fmin=46.875, fmax=1200), remove the FIXME
comment, and ensure the change is made on the SwiftF0(...) call that assigns
_swift_f0_detector so singing voices are correctly detected.

In `@src/modules/Speech_Recognition/Whisper.py`:
- Around line 120-122: The restore of torch.load is happening immediately after
whisperx.load_model(), but whisperx.load_align_model() (and any alignment
models) run later and may require the patched behavior (weights_only=False) to
load pickled models; move the assignment torch.load = _original_torch_load so it
occurs only after both model loads complete (i.e., after
whisperx.load_align_model()), or expand the monkey-patched scope to wrap both
whisperx.load_model() and whisperx.load_align_model() calls together, ensuring
the temporary torch.load override remains active until both loads finish.
- Around line 12-22: The monkey-patch for torch.load (_original_torch_load,
_patched_torch_load, torch.load) is being restored too early—after
whisperx.load_model but before whisperx.load_align_model—so keep the patch
active through the entire model loading sequence by moving the restoration
(torch.load = _original_torch_load) to after whisperx.load_align_model completes
(or re-apply the patch immediately before calling whisperx.load_align_model);
ensure both whisperx.load_model and whisperx.load_align_model execute while
torch.load still points to _patched_torch_load, then restore the original
afterwards.

In `@src/modules/Ultrastar/ultrastar_parser.py`:
- Around line 36-57: The parsing chain in ultrastar_parser.py contains a
duplicate VIDEO check and an ordering bug: remove the duplicate branch that
checks for UltrastarTxtTag.VIDEO (the second VIDEO elif) and move the
UltrastarTxtTag.VIDEOGAP check so it appears before the UltrastarTxtTag.VIDEO
check; update the code that assigns ultrastar_class.videoGap (and
ultrastar_class.video) accordingly so `#VIDEOGAP` lines match the VIDEOGAP branch
and `#VIDEO` lines match the VIDEO branch. Ensure you only keep one elif for
UltrastarTxtTag.VIDEO and that UltrastarTxtTag.VIDEOGAP is evaluated earlier in
the same if/elif chain where ultrastar_class is being populated.
🟡 Minor comments (12)
src/modules/ffmpeg_helper.py-94-97 (1)

94-97: ⚠️ Potential issue | 🟡 Minor

Docstring is inaccurate — function returns only the extension, not the codec name.

The docstring says "return codec name and appropriate file extension," and the function name includes codec, but the return value is solely the file extension string.

📝 Suggested docstring fix
 def get_audio_codec_and_extension(video_file_path: str) -> str:
     """
-    Detect audio codec from video file and return codec name and appropriate file extension.
+    Detect audio codec from video file and return the appropriate file extension.
     """
src/modules/Ultrastar/ultrastar_writer.py-56-56 (1)

56-56: ⚠️ Potential issue | 🟡 Minor

Trailing comma creates a discarded tuple expression.

Line 56 ends with , after the file.write(...) call. This makes the statement a tuple expression — the write still executes, but the trailing comma is almost certainly unintended.

-            file.write(f"#{UltrastarTxtTag.VERSION.value}:{ultrastar_class.version}\n"),
+            file.write(f"#{UltrastarTxtTag.VERSION.value}:{ultrastar_class.version}\n")
install/CPU/macos_cpu.sh-32-35 (1)

32-35: ⚠️ Potential issue | 🟡 Minor

py is not available on macOS — use python instead.

The py launcher is a Windows-only utility. On macOS, after activating the venv, the correct command is python:

Proposed fix
 echo "To run UltraSinger:"
 echo "  source .venv/bin/activate"
 echo "  cd src"
-echo "  py UltraSinger.py"
+echo "  python UltraSinger.py"
.github/workflows/pytest.yml-14-14 (1)

14-14: ⚠️ Potential issue | 🟡 Minor

Update GitHub Actions to current versions.

actions/checkout@v3 and actions/setup-python@v4 use deprecated Node.js 16 runners. Update to actions/checkout@v4 and actions/setup-python@v5 to avoid deprecation warnings and potential workflow failures.

Proposed fix
-      uses: actions/checkout@v3
+      uses: actions/checkout@v4
-      uses: actions/setup-python@v4
+      uses: actions/setup-python@v5

Also applies to: 17-17

run_on_linux.sh-5-5 (1)

5-5: ⚠️ Potential issue | 🟡 Minor

cd commands should handle failure.

If cd fails (e.g., directory doesn't exist), the script silently continues in the wrong directory, potentially running the wrong files or failing confusingly. Shellcheck SC2164.

Proposed fix
-cd "$SCRIPT_DIR"
+cd "$SCRIPT_DIR" || exit 1
-cd src
+cd src || { echo "Error: src directory not found"; exit 1; }

Also applies to: 20-20

README.md-284-287 (1)

284-287: ⚠️ Potential issue | 🟡 Minor

CUDA wheel index inconsistency: cu121 here vs cu128 in Colab notebook.

The manual installation example references cu121, but the Colab notebook (colab/UltraSinger.ipynb, line 49) uses cu128. These should be consistent, or at least the README should note which CUDA version to target. Consider aligning with the intended CUDA version for this release.

README.md-216-218 (1)

216-218: ⚠️ Potential issue | 🟡 Minor

Incorrect frequency value: 60 Hz vs 65 Hz in code.

Line 218 states "UltraSinger uses 60hz and 400hz", but pitcher.py (line 19) initializes SwiftF0 with fmin=65. This should be 65hz to match the implementation. Also, this line is missing punctuation and reads as a sentence fragment—consider rephrasing.

Proposed fix
-UltraSinger uses 60hz and 400hz
+UltraSinger uses 65 Hz and 400 Hz as the frequency range.
install/CPU/linux_cpu.sh-35-35 (1)

35-35: ⚠️ Potential issue | 🟡 Minor

py is not available on Linux — should be python.

py is the Windows Python Launcher and is not available on Linux. The CUDA install script correctly uses python. This instruction will confuse Linux users.

Proposed fix
-echo "  py UltraSinger.py"
+echo "  python UltraSinger.py"
Dockerfile-1-1 (1)

1-1: ⚠️ Potential issue | 🟡 Minor

CUDA version mismatch between base image and PyTorch wheels.

The base image uses CUDA 12.6.3, but PyTorch wheels are from the cu128 index (CUDA 12.8). The critical constraint is that the GPU driver (not the base image) must support CUDA 12.8 APIs (minimum ≥525.60.13 on Linux). While PyTorch bundles its own CUDA runtime, operations depending on the GPU driver could fail if the driver version doesn't meet the requirement. Either upgrade the base image to nvidia/cuda:12.8-runtime-ubuntu22.04 and ensure GPU driver compatibility, or switch to cu126 wheels to match the base image.

pyproject.toml-7-7 (1)

7-7: ⚠️ Potential issue | 🟡 Minor

Non-canonical PEP 440 version string.

"0.0.13-dev14" uses a hyphen; the canonical PEP 440 form is "0.0.13.dev14" (with a dot). Build tools will normalize it, but using the canonical form avoids ambiguity.

Suggested fix
-version = "0.0.13-dev14"
+version = "0.0.13.dev14"
pyproject.toml-37-37 (1)

37-37: ⚠️ Potential issue | 🟡 Minor

dataclasses is a stdlib module since Python 3.7; redundant with requires-python = ">=3.12".

The dataclasses PyPI package is a backport for Python 3.6. It's a no-op on 3.7+ and can be removed.

Suggested fix
-    "dataclasses",
pyproject.toml-11-11 (1)

11-11: ⚠️ Potential issue | 🟡 Minor

Remove setuptools from runtime dependencies.

The codebase does not use setuptools or pkg_resources at runtime. It's already correctly listed in [build-system] requires, so having it in [project] dependencies is unnecessary bloat.

🧹 Nitpick comments (20)
src/modules/Ultrastar/ultrastar_parser.py (1)

36-57: .value is redundant on str Enums (but harmless).

Since UltrastarTxtTag inherits from str, the enum member is the string — UltrastarTxtTag.ARTIST == "ARTIST" is already True. Adding .value is not wrong, but it's unnecessary boilerplate. Consistency across the codebase is the main benefit here, so this is fine to keep if that's the project convention.

src/modules/Ultrastar/ultrastar_writer.py (1)

143-144: Use f-string conversion flag instead of wrapping in str().

Per the Ruff RUF010 hint, prefer the !s conversion flag:

-                linebreak = f"{UltrastarTxtTag.LINEBREAK.value} " \
-                            f"{str(round(show_next))}\n"
+                linebreak = f"{UltrastarTxtTag.LINEBREAK.value} " \
+                            f"{round(show_next)!s}\n"
src/modules/Audio/key_detector.py (1)

97-97: Noisy per-note log in a hot path — consider removing or gating behind a verbose/debug flag.

quantize_note_to_key is called for every note in a song, so this print will flood the console. The commented-out print on line 147 suggests per-note logging was previously deemed too verbose. Additionally, the message says "with allowed notes" but doesn't actually print the allowed_notes set, limiting its diagnostic value.

Suggested fix

Either remove it or gate it behind a debug/verbose flag, and include allowed_notes if it's meant for diagnostics:

-    print(f"{ULTRASINGER_HEAD} Quantizing note {blue_highlighted(note)} to key with allowed notes")
+    # Uncomment for debugging:
+    # print(f"{ULTRASINGER_HEAD} Quantizing note {blue_highlighted(note)} to key with allowed notes: {allowed_notes}")
install/CUDA/windows_cuda_gpu.bat (1)

15-19: Remove commented-out dead code.

This commented-out venv removal block adds clutter. If it's not needed, remove it; if it may be needed later, track it in an issue instead.

install/CPU/windows_cpu.bat (2)

85-94: CPU PyTorch install doesn't pin versions, unlike the CUDA script.

The CUDA variant (windows_cuda_gpu.bat line 90) pins torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0, while this CPU script installs unpinned versions. This inconsistency can lead to version mismatches or unexpected breakage if a new PyTorch release is incompatible with the project's dependencies.

Proposed fix — pin CPU PyTorch versions to match
 :: Install PyTorch CPU version
-uv pip install --index-url https://download.pytorch.org/whl/cpu torch torchvision torchaudio
+uv pip install --index-url https://download.pytorch.org/whl/cpu torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0

1-97: Significant duplication with windows_cuda_gpu.bat.

Lines 1–83 are nearly identical to install/CUDA/windows_cuda_gpu.bat. Consider extracting the shared Python-detection and uv-bootstrap logic into a common helper script (e.g., install/common_setup.bat) and calling it from both scripts to reduce maintenance burden.

src/modules/Pitcher/pitcher.py (1)

10-20: Global mutable state — not thread-safe.

The lazy-initialized _swift_f0_detector global with no synchronization is fine for single-threaded use, but worth noting. If this module is ever used concurrently, a race condition could create multiple detector instances. For now this is acceptable given the application's single-threaded nature.

ReleaseNotes.md (1)

17-19: Minor capitalization inconsistency.

Line 19 starts with lowercase "upgrade" while all other entries in this section use a capitalized first word (e.g., "Changed", "Drop"). Consider: Upgrade to Python 3.12.

run_on_linux.sh (1)

22-22: Command-line arguments are not forwarded to UltraSinger.

Users would expect to pass arguments like -i "song.mp3" via this script. Currently no arguments are forwarded.

Proposed fix
-python UltraSinger.py
+python UltraSinger.py "$@"
pytest/modules/Pitcher/test_pitcher.py (2)

12-12: Stale test method name.

The method is still named test_get_pitch_with_crepe_file but now tests SwiftF0 via get_pitch_with_file. Consider renaming to test_get_pitch_with_file for clarity.


22-22: Remove commented-out CREPE call.

This dead code is no longer relevant after the migration to SwiftF0.

run_on_windows.bat (1)

1-5: Behavioral inconsistency with Linux script and missing .venv check.

Two concerns:

  1. Unlike run_on_linux.sh, there's no check for .venv existence. If it's missing, activate.bat will silently fail.
  2. The Linux script auto-runs python UltraSinger.py, but this script opens an interactive shell (cmd /k) requiring the user to type the command manually. If this is intentional, consider adding a note. Otherwise, consider aligning behavior:
Proposed fix
 `@echo` off
-:: Activate the virtual environment and open cmd in the src directory
-call .venv\Scripts\activate.bat
-cd src
-cmd /k
+:: Activate the virtual environment and run UltraSinger
+if not exist ".venv\Scripts\activate.bat" (
+    echo Error: Virtual environment not found at .venv
+    echo Please run one of the installation scripts first.
+    pause
+    exit /b 1
+)
+call .venv\Scripts\activate.bat
+cd src
+echo Starting UltraSinger...
+python UltraSinger.py %*
+pause
colab/UltraSinger.ipynb (1)

33-33: Colab clones the default branch — may not reflect latest changes.

The notebook always clones from main. Users testing unreleased changes (e.g., this PR's branch) would need to manually edit the clone command. Consider adding a comment noting how to switch branches.

run_on_mac.command (1)

2-21: Consider adding set -e or guarding the cd src call.

If the src directory doesn't exist, cd src will fail silently and python UltraSinger.py will execute from the wrong directory (project root). The other install scripts in this PR use set -e. Also, consider forwarding arguments with "$@" to python UltraSinger.py for consistency with typical launchers.

Proposed improvement
 #!/bin/bash
+set -e
 # Activate virtual environment and run UltraSinger on macOS

 SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
 cd "$SCRIPT_DIR"
 ...
 cd src
 echo "Starting UltraSinger..."
-python UltraSinger.py
+python UltraSinger.py "$@"
Dockerfile (1)

8-14: Add --no-install-recommends to reduce image size.

As flagged by static analysis, the apt-get install calls are missing --no-install-recommends, which pulls in unnecessary recommended packages and bloats the image.

Proposed fix
 RUN apt-get update \
-    && apt-get install -y software-properties-common \
+    && apt-get install -y --no-install-recommends software-properties-common \
     && add-apt-repository ppa:deadsnakes/ppa -y \
     && apt-get update \
-    && apt-get install -y git python3.12 python3.12-venv python3.12-dev ffmpeg curl tzdata \
+    && apt-get install -y --no-install-recommends git python3.12 python3.12-venv python3.12-dev ffmpeg curl tzdata \
     && apt-get clean  \
     && rm -rf /var/lib/apt/lists/*
src/modules/DeviceDetection/device_detection.py (1)

7-7: Module-level pytorch_gpu_supported is unused dead code.

This global is shadowed by local variables in both check_gpu_support and __check_pytorch_support — neither function uses the global keyword to read or write it.

Proposed fix
-pytorch_gpu_supported = False
-
 def check_gpu_support() -> str:
src/UltraSinger.py (1)

790-793: Dead CLI options: --crepe, --crepe_step_size, --force_crepe_cpu are still parsed but no longer used.

Since CREPE has been replaced by SwiftF0, these options and their corresponding settings fields are dead code. They can be cleaned up to avoid confusing users.

Also applies to: 813-814

pyproject.toml (3)

2-3: Unnecessary build dependencies alongside hatchling.

Since you've switched to hatchling.build as the backend and the version is now statically defined, setuptools, wheel, and setuptools_scm in requires are unnecessary. They add build-time overhead without being used.

Suggested fix
-requires = ["hatchling", "setuptools", "wheel", "setuptools_scm"]
+requires = ["hatchling"]

44-48: Empty platform optional-dependency groups.

The windows, linux, and macos groups are empty placeholders. If there are no platform-specific dependencies planned, consider removing them to avoid confusion. If platform-specific PyTorch index URLs are needed (e.g., CUDA on Linux/Windows vs. CPU on macOS), this is where they'd go.


48-51: Duplicate dev group in both [project.optional-dependencies] and [dependency-groups].

pytest appears in both [project.optional-dependencies] dev and [dependency-groups] dev. Pick one mechanism. If using uv, [dependency-groups] (PEP 735) is the idiomatic choice.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@Dockerfile`:
- Line 1: The Dockerfile's base image FROM
nvidia/cuda:12.6.3-runtime-ubuntu22.04 conflicts with the PyTorch wheels built
for cu128 (CUDA 12.8); update the FROM line to a CUDA 12.8 runtime image (for
example nvidia/cuda:12.8.1-runtime-ubuntu22.04) to align runtimes, or
alternatively install PyTorch wheels built for cu126 to match the existing FROM
image—modify the Dockerfile's FROM directive accordingly so the container CUDA
runtime and chosen PyTorch wheel target match.
- Around line 21-29: The Dockerfile performs an editable PEP 660 install with
"RUN uv pip install --system --python 3.12 -e ." while only copying
pyproject.toml and creating an empty src/ directory; because pyproject.toml uses
hatchling with packages = ["src"], hatchling will fail to discover packages
without a package skeleton. Fix by either copying a minimal package skeleton
(e.g., add a src/__init__.py) into the image before the editable install or
change the build so you first install deps (pip install from pyproject) and then
run the editable install (uv pip install -e .) after copying the full source
tree; adjust the steps around COPY ./pyproject.toml, RUN mkdir -p
/app/UltraSinger/src, and the RUN uv pip install... line accordingly.
🧹 Nitpick comments (3)
Dockerfile (3)

8-14: Add --no-install-recommends to reduce image size.

Both apt-get install invocations pull in recommended packages that are unnecessary in a container. This can significantly bloat the image.

♻️ Proposed fix
 RUN apt-get update \
-    && apt-get install -y software-properties-common \
+    && apt-get install -y --no-install-recommends software-properties-common \
     && add-apt-repository ppa:deadsnakes/ppa -y \
     && apt-get update \
-    && apt-get install -y git python3.12 python3.12-venv python3.12-dev ffmpeg curl tzdata \
+    && apt-get install -y --no-install-recommends git python3.12 python3.12-venv python3.12-dev ffmpeg curl tzdata \
     && apt-get clean  \
     && rm -rf /var/lib/apt/lists/*

16-18: Consider pinning the uv version for reproducible builds.

The install script fetches the latest uv version. A future breaking change could silently break the Docker build. You can pin a version via the UV_INSTALLER_VERSION env var or by appending a version to the URL (e.g., https://astral.sh/uv/0.6.x/install.sh).


34-40: Combine COPY and chown to save a layer and reduce build time.

chown -R on line 39 creates a duplicate layer of the entire source tree. Using COPY --chown combines both operations into a single layer.

♻️ Proposed fix
 # copy sources late to allow for caching of layers which contain all the dependencies
-COPY . /app/UltraSinger
-
-
-# no need to run as root
-RUN chown -R 1000:1000 /app/UltraSinger
+COPY --chown=1000:1000 . /app/UltraSinger
+
 USER 1000:1000

@rakuri255 rakuri255 self-assigned this Feb 15, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/modules/Speech_Recognition/Whisper.py (1)

163-196: 🛠️ Refactor suggestion | 🟠 Major

Use a finally block instead of duplicating torch.load restoration in every handler.

The restoration is repeated four times (lines 165, 170, 184, 191). More importantly, BaseException subclasses like KeyboardInterrupt or SystemExit will bypass all except blocks and leave the patch active.

Suggested refactor with finally
     try:
         torch.cuda.empty_cache()
         loaded_whisper_model = whisperx.load_model(
             ...
         )
         ...
         result_aligned = whisperx.align(...)
         transcribed_data = convert_to_transcribed_data(result_aligned)
-        # Restore original torch.load after models are loaded
-        torch.load = _original_torch_load
         return TranscriptionResult(transcribed_data, detected_language)
     except ValueError as value_error:
-        torch.load = _original_torch_load
         ...
         raise value_error
     except OutOfMemoryError as oom_exception:
-        torch.load = _original_torch_load
         ...
         raise oom_exception
     except Exception as exception:
-        torch.load = _original_torch_load
         ...
         raise exception
+    finally:
+        torch.load = _original_torch_load
🤖 Fix all issues with AI agents
In `@pyproject.toml`:
- Line 7: Update the version string in pyproject.toml to use canonical PEP 440
formatting: replace version = "0.0.13-dev14" with version = "0.0.13.dev14" so
the project version uses the dot-separated dev spec per PEP 440 normalization.

In `@src/modules/Speech_Recognition/Whisper.py`:
- Around line 12-22: The current module-level monkey-patch
(_original_torch_load, _patched_torch_load, torch.load = _patched_torch_load) is
correctly applied before importing whisperx but the restoration of torch.load is
duplicated inside multiple except handlers; change this to restore the original
loader in a single finally block that always executes—remove the torch.load =
_original_torch_load lines from the except handlers and add a finally:
torch.load = _original_torch_load after the try/except chain that surrounds
whisperx.load_model and the transcription logic (the block that returns
TranscriptionResult or re-raises ValueError / OutOfMemoryError / Exception).

In `@src/modules/Ultrastar/coverter/ultrastar_txt_converter.py`:
- Line 73: The ultrastar_txt.video assignment blindly concatenates basename and
media_info.video_extension which can be None; update the logic in
ultrastar_txt_converter.py to only set ultrastar_txt.video when
media_info.video_extension is truthy (non-None/non-empty) — otherwise leave
ultrastar_txt.video unset or None so the serializer omits the `#VIDEO` tag;
reference ultrastar_txt.video and media_info.video_extension to locate and
change the assignment.
- Around line 69-72: from_ultrastar_txt creates a MediaInfo without
audio_extension which results in filenames like "basename.None"; when
constructing the MediaInfo in from_ultrastar_txt set a sensible default (e.g.,
"mp3") for audio_extension (if audio_extension is None or falsy) so that
ultrastar_txt.mp3 / ultrastar_txt.audio / ultrastar_txt.vocals /
ultrastar_txt.instrumental produce valid filenames; update the MediaInfo
creation in from_ultrastar_txt (and any other code paths that create MediaInfo
from Ultrastar text, e.g., the export paths used by UltraSinger) to assign
audio_extension = "mp3" when none is provided.

In `@src/modules/Ultrastar/ultrastar_writer.py`:
- Line 56: The line calling file.write currently ends with a trailing comma
which turns the call into a discarded tuple (e.g., file.write(...),); remove the
trailing comma so the write call is executed normally. Update the occurrence
that uses UltrastarTxtTag.VERSION.value and ultrastar_class.version in
ultrastar_writer.py (the file.write(...) invocation) to drop the comma and
ensure the string is written to the file.

In `@src/UltraSinger.py`:
- Around line 670-673: The audio extension variable currently includes a leading
dot causing later code that does basename + "." + audio_extension to produce a
double-dot filename; update the block that sets
basename_with_ext/audio_ext/video_ext so audio_ext is stored without a leading
dot (use extension.lstrip('.')) and build basename_with_ext consistently (e.g.,
f"{basename_without_ext}.{audio_ext}") so downstream code (see
separate_audio_video usage and ultrastar_txt_converter.py filename construction)
produces correct single-dot filenames.
- Around line 520-521: When building output paths and calling
convert_audio_format, guard against process_data.media_info.audio_extension
being None: in the code that computes karaoke_output_path and the other output
path uses (symbols: karaoke_output_path, process_data.process_data_paths,
convert_audio_format, and media_info.audio_extension), check if
process_data.media_info.audio_extension is None and either (a) use a sensible
fallback extension (e.g., "mp3" or a constant DEFAULT_AUDIO_EXTENSION) before
concatenating, or (b) skip creating/ converting those files and log/raise a
clear error; apply the same guard wherever media_info.audio_extension is
concatenated for output paths so no TypeError occurs.
🧹 Nitpick comments (10)
.github/workflows/pylint.yml (1)

14-14: Python version inconsistency with other workflows.

The rest of the PR standardizes on Python 3.12 (pytest.yml, main.yml, Dockerfile, pyproject.toml), but this workflow still uses 3.10. While the workflow is currently disabled (triggers on none.txt), it would be good to align it now to avoid confusion if it's ever re-enabled.

.github/workflows/pytest.yml (1)

21-27: Consider pinning the uv version for CI reproducibility.

curl -LsSf https://astral.sh/uv/install.sh | sh installs the latest uv. A breaking change in a future uv release could silently break CI. You can pin a version, e.g.:

curl -LsSf https://astral.sh/uv/0.6.x/install.sh | sh

(Same concern applies to the identical pattern in main.yml.)

Dockerfile (1)

8-15: Add --no-install-recommends to reduce image size.

The apt-get install calls pull in recommended packages that aren't needed in a container, bloating the image unnecessarily.

Proposed fix
 RUN apt-get update \
-    && apt-get install -y software-properties-common \
+    && apt-get install -y --no-install-recommends software-properties-common \
     && add-apt-repository ppa:deadsnakes/ppa -y \
     && apt-get update \
-    && apt-get install -y git python3.12 python3.12-venv python3.12-dev ffmpeg curl tzdata \
-       build-essential gcc g++ make \
+    && apt-get install -y --no-install-recommends git python3.12 python3.12-venv python3.12-dev ffmpeg curl tzdata \
+       build-essential gcc g++ make ca-certificates \
src/modules/ffmpeg_helper.py (1)

94-97: Docstring is misleading — function returns only the extension, not the codec name.

The docstring says "return codec name and appropriate file extension" but the function only returns the extension string.

Proposed fix
 def get_audio_codec_and_extension(video_file_path: str) -> str:
     """
-    Detect audio codec from video file and return codec name and appropriate file extension.
+    Detect audio codec from video file and return the appropriate file extension.
     """
src/modules/Audio/youtube.py (1)

107-109: final_video_path is unpacked but never used — prefix with underscore.

Per Ruff RUF059, final_video_path is assigned but unused. Prefix it to signal intentional discard.

Proposed fix
-    audio_file_path, final_video_path, audio_ext, video_ext = separate_audio_video(
+    audio_file_path, _final_video_path, audio_ext, video_ext = separate_audio_video(
pyproject.toml (2)

34-34: dataclasses dependency is unnecessary for Python ≥ 3.12.

The dataclasses module has been part of the standard library since Python 3.7. The PyPI dataclasses package is a backport for 3.6 only. Since requires-python = ">=3.12", this dependency is dead weight.

     "tqdm",
     "yt-dlp",
     "music21",
-    "dataclasses",
     "dataclasses-json",

41-48: Duplicate dev dependency definitions.

dev is declared in both [project.optional-dependencies] (line 45) and [dependency-groups] (line 48) with identical content. Having both is confusing—pick one. [dependency-groups] is the newer PEP 735 mechanism; [project.optional-dependencies] is more widely supported.

src/UltraSinger.py (2)

666-666: Unused variable final_video_path.

Static analysis flags final_video_path as unused. Prefix with underscore to signal intent.

-        ultrastar_audio_input_path, final_video_path, audio_ext, video_ext = separate_audio_video(
+        ultrastar_audio_input_path, _final_video_path, audio_ext, video_ext = separate_audio_video(

793-797: Dead code: --crepe and --crepe_step_size CLI options are still parsed but CREPE has been removed.

These options set settings.crepe_model_capacity and settings.crepe_step_size, but the pitch path now uses SwiftF0 exclusively. The options in arg_options() (lines 868-869) and the corresponding --force_crepe_cpu handling (lines 817-818, 885) should also be cleaned up.

src/modules/Ultrastar/ultrastar_writer.py (1)

86-91: #TAGS is written after #GAP — verify this matches the UltraStar format spec ordering expectations.

The TAGS field is written between GAP and CREATOR. While the UltraStar format may not enforce strict tag ordering, it's worth confirming this placement is intentional and consistent with other tools' expectations.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/UltraSinger.py (1)

793-798: ⚠️ Potential issue | 🟡 Minor

Dead CLI options: --crepe, --crepe_step_size, --force_crepe_cpu still accepted.

These flags set attributes that were removed from Settings. They silently do nothing now. Remove the option handling and the arg_options entries to avoid confusing users.

Suggested fix for arg_options
     long = [
         "ifile=",
         "ofile=",
-        "crepe=",
-        "crepe_step_size=",
         "demucs=",
         ...
-        "force_crepe_cpu",
         "format_version=",

Also applies to: 817-818, 868-869, 885-885

src/Settings.py (1)

50-53: ⚠️ Potential issue | 🟡 Minor

Remove dead CREPE CLI option handling and clean up related code.

The crepe_model_capacity, crepe_step_size, and force_crepe_cpu attributes were removed from Settings, but dead references remain across multiple files:

  • UltraSinger.py (lines 794–798, 817–818): CLI flags that set removed attributes
  • UltraSinger.py (lines 868–869, 885): Dead entries in arg_options()
  • init_interactive_mode.py (line 127): Sets force_crepe_cpu attribute dynamically
  • common_print.py (lines 46–48, 69): Outdated help text documenting CREPE flags

Since Settings is non-frozen, these assignments silently create dynamic attributes that go unused, misleading users. Remove all CREPE flag handling and update documentation.

🤖 Fix all issues with AI agents
In `@pyproject.toml`:
- Around line 2-3: Remove the unnecessary build-time dependencies from
pyproject.toml by editing the requires array used with the build-backend
"hatchling.build": remove "setuptools" and "setuptools_scm" from the requires
list since hatchling is the specified build-backend and the project uses a
static version (no tool.setuptools_scm config), while "setuptools" remains
declared as a runtime dependency under [project] dependencies.

In `@src/UltraSinger.py`:
- Around line 707-714: The pitch cache branch is using the wrong flag: replace
the check of settings.skip_cache_transcription with the dedicated
settings.skip_cache_pitch_detection so the code that decides to call
get_pitch_with_file(process_data_paths.processing_audio_path) uses "if
settings.skip_cache_pitch_detection or not cache_available" (leave
pitching_config, pitched_data_path and check_file_exists as-is); this aligns the
skip behavior with the Settings flag that controls pitch detection caching.
🧹 Nitpick comments (5)
src/modules/Speech_Recognition/Whisper.py (1)

164-196: Redundant torch.load restoration in each except block — the finally block handles it.

The finally block at line 193 guarantees restoration regardless of how the try exits, so the three identical torch.load = _original_torch_load assignments in the except handlers (lines 166, 180, 187) are dead code.

Proposed cleanup
     except ValueError as value_error:
-        # Restore original torch.load in case of error
-        torch.load = _original_torch_load
-
         if (
             "Requested float16 compute type, but the target device or backend do not support efficient float16 computation."
             in str(value_error.args[0])
         ):
             print(value_error)
             print(
                 f"{ULTRASINGER_HEAD} Your GPU does not support efficient float16 computation; run UltraSinger with '--whisper_compute_type int8'"
             )
-
         raise value_error
     except OutOfMemoryError as oom_exception:
-        # Restore original torch.load in case of error
-        torch.load = _original_torch_load
-
         print(oom_exception)
         print(MEMORY_ERROR_MESSAGE)
         raise oom_exception
     except Exception as exception:
-        # Restore original torch.load in case of error
-        torch.load = _original_torch_load
-
         if "CUDA failed with error out of memory" in str(exception.args[0]):
             print(exception)
             print(MEMORY_ERROR_MESSAGE)
         raise exception
     finally:
         # Restore original torch.load after models are loaded
         torch.load = _original_torch_load
src/modules/Ultrastar/coverter/ultrastar_txt_converter.py (1)

95-96: Clarify or remove the TODO.

The # todo: as list add here? is vague. If multi-key support is planned, consider opening an issue to track it rather than leaving an ambiguous TODO.

pyproject.toml (2)

34-34: dataclasses is unnecessary for Python ≥ 3.12.

The dataclasses module has been part of the standard library since Python 3.7. The PyPI package is a backport for 3.6 only. Since requires-python = ">=3.12", this dependency is dead weight and can be removed.

Suggested fix
     "tqdm",
     "yt-dlp",
     "music21",
-    "dataclasses",
     "dataclasses-json",

42-44: Empty platform optional-dependency groups serve no purpose.

windows = [], linux = [], macos = [] are placeholders with no dependencies. If platform-specific dependencies are planned, consider adding a TODO or removing these until needed.

src/UltraSinger.py (1)

666-666: Unused variable final_video_path.

final_video_path from separate_audio_video is unpacked but never used. Use _ to signal intent.

Suggested fix
-        ultrastar_audio_input_path, final_video_path, audio_ext, video_ext = separate_audio_video(
+        ultrastar_audio_input_path, _, audio_ext, video_ext = separate_audio_video(

@rakuri255 rakuri255 merged commit efba4eb into main Feb 15, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment