Fix installer with uv and swift f0 pitcher#299
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughDependency management moved to uv/pyproject.toml and Python bumped to 3.12; CREPE pitch detection replaced with SwiftF0; TensorFlow GPU checks removed in favor of PyTorch/CUDA (cu128); media audio/video extensions are now propagated through processing and output generation. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant US as UltraSinger
participant FS as Filesystem
participant PK as Pitcher.get_pitch_with_file
participant SF as SwiftF0 Detector
US->>FS: ensure processing_audio_path exists
FS-->>US: file path
US->>PK: get_pitch_with_file(processing_audio_path)
PK->>FS: read audio file bytes
FS-->>PK: audio bytes
PK->>SF: detect(audio, sample_rate)
SF-->>PK: pitch frames + confidences
PK-->>US: return PitchedData
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ReleaseNotes.md (1)
1-1:⚠️ Potential issue | 🟡 MinorVersion mismatch: ReleaseNotes.md declares
0.0.13dev13butpyproject.tomldeclares0.0.13-dev14.Update the version header in ReleaseNotes.md to match pyproject.toml, or align pyproject.toml to the release notes version.
🤖 Fix all issues with AI agents
In @.github/workflows/main.yml:
- Around line 30-33: The workflow uses deprecated action versions; update the
GitHub Actions steps so the checkout and Python setup actions use newer major
releases by changing uses: actions/checkout@v3 to actions/checkout@v4 and uses:
actions/setup-python@v4 to actions/setup-python@v5 (ensure any inputs/outputs
remain compatible with the new action versions and run a workflow test).
In `@install/fix_cuda11_libs.bat`:
- Around line 1-100: The batch script install/fix_cuda11_libs.bat currently has
LF-only endings which can break Windows cmd parsing; convert this file to CRLF
line endings (save/commit with CRLF) and add a repository .gitattributes entry
to enforce it (e.g. add a line "*.bat text eol=crlf" to .gitattributes) so
future commits keep CRLF; ensure the committed install/fix_cuda11_libs.bat is
verified in git as CRLF before merging.
In `@install/fix_cudnn8.bat`:
- Around line 41-84: The fallback to search for cudnn64_8.dll in the pip cache
(the PowerShell block that sets $found and prints paths) never wires the found
paths into the later copy logic, so when the download fails the script still
expects cudnn.zip and the extracted archive; fix by capturing the actual full
path(s) returned by that PowerShell search and then copy those DLLs into
"%VIRTUAL_ENV%\Lib\site-packages\torch\lib\" (the same target used by the
existing copy command) instead of relying on cudnn.zip; implement this by
changing the PowerShell call to output the found file paths to an environment
variable or to a temp file and add a conditional branch (or jump label) that,
when the search succeeds (%errorlevel% == 0), performs copy /Y
"<found_path_or_paths>" "%VIRTUAL_ENV%\Lib\site-packages\torch\lib\" and lists
them, otherwise keep the existing manual-download error path.
In `@pyproject.toml`:
- Around line 32-35: The listed development tools ("isort", "black", "pylint",
"pytest") are incorrectly placed in main dependencies; remove those entries from
the project dependencies block and add them to the dev group (either under
[project.optional-dependencies] with a "dev" key or into
[project.dependency-groups] -> dev) so they are installed only for development.
Ensure the names "isort", "black", "pylint", and "pytest" appear under the dev
group and update any existing dev group entries to include them rather than
keeping them in the top-level dependencies.
In `@src/modules/ffmpeg_helper.py`:
- Line 130: The default extension used when looking up codec_to_extension
contains a leading dot, causing filenames like "file..wav"; change the lookup so
the default has no leading dot (e.g., extension =
codec_to_extension.get(codec_name, "wav")) so that the caller building
f"{basename_without_ext}.{audio_ext}" produces a single dot; ensure
codec_to_extension values remain dot-less and that the extension variable (and
any audio_ext usage) expects a bare extension string.
In `@src/modules/Pitcher/pitcher.py`:
- Around line 37-39: The current normalization assumes 16-bit PCM; update the
normalization logic around the variable "audio" in
src/modules/Pitcher/pitcher.py so it handles common bit depths and unsigned
formats: detect dtype (np.uint8, signed integers like np.int16/np.int32, and
floats) and convert to float32 with correct scaling—for uint8 subtract 128 and
divide by 128, for signed integers use np.iinfo(dtype) to compute the max
absolute value and divide by that (e.g., int16 -> 32768, int32 -> 2**31 or use
max(abs(min), max)), and for float32/float64 leave as-is (or cast to float32);
implement this in place of the single hardcoded audio.astype(...)/(2**15) line
so all common WAV bit depths normalize correctly.
- Around line 16-19: The SwiftF0 instantiation currently uses speech-only bounds
(SwiftF0(fmin=65, fmax=400)) which will misclassify high singing pitches; update
the _swift_f0_detector initialization to use a music-appropriate pitch range
(either the documented full range SwiftF0(fmin=46.875, fmax=2093.75) or a
narrower singing-focused range like fmin=46.875, fmax=1200), remove the FIXME
comment, and ensure the change is made on the SwiftF0(...) call that assigns
_swift_f0_detector so singing voices are correctly detected.
In `@src/modules/Speech_Recognition/Whisper.py`:
- Around line 120-122: The restore of torch.load is happening immediately after
whisperx.load_model(), but whisperx.load_align_model() (and any alignment
models) run later and may require the patched behavior (weights_only=False) to
load pickled models; move the assignment torch.load = _original_torch_load so it
occurs only after both model loads complete (i.e., after
whisperx.load_align_model()), or expand the monkey-patched scope to wrap both
whisperx.load_model() and whisperx.load_align_model() calls together, ensuring
the temporary torch.load override remains active until both loads finish.
- Around line 12-22: The monkey-patch for torch.load (_original_torch_load,
_patched_torch_load, torch.load) is being restored too early—after
whisperx.load_model but before whisperx.load_align_model—so keep the patch
active through the entire model loading sequence by moving the restoration
(torch.load = _original_torch_load) to after whisperx.load_align_model completes
(or re-apply the patch immediately before calling whisperx.load_align_model);
ensure both whisperx.load_model and whisperx.load_align_model execute while
torch.load still points to _patched_torch_load, then restore the original
afterwards.
In `@src/modules/Ultrastar/ultrastar_parser.py`:
- Around line 36-57: The parsing chain in ultrastar_parser.py contains a
duplicate VIDEO check and an ordering bug: remove the duplicate branch that
checks for UltrastarTxtTag.VIDEO (the second VIDEO elif) and move the
UltrastarTxtTag.VIDEOGAP check so it appears before the UltrastarTxtTag.VIDEO
check; update the code that assigns ultrastar_class.videoGap (and
ultrastar_class.video) accordingly so `#VIDEOGAP` lines match the VIDEOGAP branch
and `#VIDEO` lines match the VIDEO branch. Ensure you only keep one elif for
UltrastarTxtTag.VIDEO and that UltrastarTxtTag.VIDEOGAP is evaluated earlier in
the same if/elif chain where ultrastar_class is being populated.
🟡 Minor comments (12)
src/modules/ffmpeg_helper.py-94-97 (1)
94-97:⚠️ Potential issue | 🟡 MinorDocstring is inaccurate — function returns only the extension, not the codec name.
The docstring says "return codec name and appropriate file extension," and the function name includes
codec, but the return value is solely the file extension string.📝 Suggested docstring fix
def get_audio_codec_and_extension(video_file_path: str) -> str: """ - Detect audio codec from video file and return codec name and appropriate file extension. + Detect audio codec from video file and return the appropriate file extension. """src/modules/Ultrastar/ultrastar_writer.py-56-56 (1)
56-56:⚠️ Potential issue | 🟡 MinorTrailing comma creates a discarded tuple expression.
Line 56 ends with
,after thefile.write(...)call. This makes the statement a tuple expression — the write still executes, but the trailing comma is almost certainly unintended.- file.write(f"#{UltrastarTxtTag.VERSION.value}:{ultrastar_class.version}\n"), + file.write(f"#{UltrastarTxtTag.VERSION.value}:{ultrastar_class.version}\n")install/CPU/macos_cpu.sh-32-35 (1)
32-35:⚠️ Potential issue | 🟡 Minor
pyis not available on macOS — usepythoninstead.The
pylauncher is a Windows-only utility. On macOS, after activating the venv, the correct command ispython:Proposed fix
echo "To run UltraSinger:" echo " source .venv/bin/activate" echo " cd src" -echo " py UltraSinger.py" +echo " python UltraSinger.py".github/workflows/pytest.yml-14-14 (1)
14-14:⚠️ Potential issue | 🟡 MinorUpdate GitHub Actions to current versions.
actions/checkout@v3andactions/setup-python@v4use deprecated Node.js 16 runners. Update toactions/checkout@v4andactions/setup-python@v5to avoid deprecation warnings and potential workflow failures.Proposed fix
- uses: actions/checkout@v3 + uses: actions/checkout@v4- uses: actions/setup-python@v4 + uses: actions/setup-python@v5Also applies to: 17-17
run_on_linux.sh-5-5 (1)
5-5:⚠️ Potential issue | 🟡 Minor
cdcommands should handle failure.If
cdfails (e.g., directory doesn't exist), the script silently continues in the wrong directory, potentially running the wrong files or failing confusingly. Shellcheck SC2164.Proposed fix
-cd "$SCRIPT_DIR" +cd "$SCRIPT_DIR" || exit 1-cd src +cd src || { echo "Error: src directory not found"; exit 1; }Also applies to: 20-20
README.md-284-287 (1)
284-287:⚠️ Potential issue | 🟡 MinorCUDA wheel index inconsistency:
cu121here vscu128in Colab notebook.The manual installation example references
cu121, but the Colab notebook (colab/UltraSinger.ipynb, line 49) usescu128. These should be consistent, or at least the README should note which CUDA version to target. Consider aligning with the intended CUDA version for this release.README.md-216-218 (1)
216-218:⚠️ Potential issue | 🟡 MinorIncorrect frequency value: 60 Hz vs 65 Hz in code.
Line 218 states "UltraSinger uses 60hz and 400hz", but
pitcher.py(line 19) initializes SwiftF0 withfmin=65. This should be65hzto match the implementation. Also, this line is missing punctuation and reads as a sentence fragment—consider rephrasing.Proposed fix
-UltraSinger uses 60hz and 400hz +UltraSinger uses 65 Hz and 400 Hz as the frequency range.install/CPU/linux_cpu.sh-35-35 (1)
35-35:⚠️ Potential issue | 🟡 Minor
pyis not available on Linux — should bepython.
pyis the Windows Python Launcher and is not available on Linux. The CUDA install script correctly usespython. This instruction will confuse Linux users.Proposed fix
-echo " py UltraSinger.py" +echo " python UltraSinger.py"Dockerfile-1-1 (1)
1-1:⚠️ Potential issue | 🟡 MinorCUDA version mismatch between base image and PyTorch wheels.
The base image uses CUDA 12.6.3, but PyTorch wheels are from the
cu128index (CUDA 12.8). The critical constraint is that the GPU driver (not the base image) must support CUDA 12.8 APIs (minimum ≥525.60.13 on Linux). While PyTorch bundles its own CUDA runtime, operations depending on the GPU driver could fail if the driver version doesn't meet the requirement. Either upgrade the base image to nvidia/cuda:12.8-runtime-ubuntu22.04 and ensure GPU driver compatibility, or switch tocu126wheels to match the base image.pyproject.toml-7-7 (1)
7-7:⚠️ Potential issue | 🟡 MinorNon-canonical PEP 440 version string.
"0.0.13-dev14"uses a hyphen; the canonical PEP 440 form is"0.0.13.dev14"(with a dot). Build tools will normalize it, but using the canonical form avoids ambiguity.Suggested fix
-version = "0.0.13-dev14" +version = "0.0.13.dev14"pyproject.toml-37-37 (1)
37-37:⚠️ Potential issue | 🟡 Minor
dataclassesis a stdlib module since Python 3.7; redundant withrequires-python = ">=3.12".The
dataclassesPyPI package is a backport for Python 3.6. It's a no-op on 3.7+ and can be removed.Suggested fix
- "dataclasses",pyproject.toml-11-11 (1)
11-11:⚠️ Potential issue | 🟡 MinorRemove
setuptoolsfrom runtime dependencies.The codebase does not use
setuptoolsorpkg_resourcesat runtime. It's already correctly listed in[build-system] requires, so having it in[project] dependenciesis unnecessary bloat.
🧹 Nitpick comments (20)
src/modules/Ultrastar/ultrastar_parser.py (1)
36-57:.valueis redundant onstrEnums (but harmless).Since
UltrastarTxtTaginherits fromstr, the enum member is the string —UltrastarTxtTag.ARTIST == "ARTIST"is alreadyTrue. Adding.valueis not wrong, but it's unnecessary boilerplate. Consistency across the codebase is the main benefit here, so this is fine to keep if that's the project convention.src/modules/Ultrastar/ultrastar_writer.py (1)
143-144: Use f-string conversion flag instead of wrapping instr().Per the Ruff RUF010 hint, prefer the
!sconversion flag:- linebreak = f"{UltrastarTxtTag.LINEBREAK.value} " \ - f"{str(round(show_next))}\n" + linebreak = f"{UltrastarTxtTag.LINEBREAK.value} " \ + f"{round(show_next)!s}\n"src/modules/Audio/key_detector.py (1)
97-97: Noisy per-note log in a hot path — consider removing or gating behind a verbose/debug flag.
quantize_note_to_keyis called for every note in a song, so this print will flood the console. The commented-out print on line 147 suggests per-note logging was previously deemed too verbose. Additionally, the message says "with allowed notes" but doesn't actually print theallowed_notesset, limiting its diagnostic value.Suggested fix
Either remove it or gate it behind a debug/verbose flag, and include
allowed_notesif it's meant for diagnostics:- print(f"{ULTRASINGER_HEAD} Quantizing note {blue_highlighted(note)} to key with allowed notes") + # Uncomment for debugging: + # print(f"{ULTRASINGER_HEAD} Quantizing note {blue_highlighted(note)} to key with allowed notes: {allowed_notes}")install/CUDA/windows_cuda_gpu.bat (1)
15-19: Remove commented-out dead code.This commented-out venv removal block adds clutter. If it's not needed, remove it; if it may be needed later, track it in an issue instead.
install/CPU/windows_cpu.bat (2)
85-94: CPU PyTorch install doesn't pin versions, unlike the CUDA script.The CUDA variant (
windows_cuda_gpu.batline 90) pinstorch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0, while this CPU script installs unpinned versions. This inconsistency can lead to version mismatches or unexpected breakage if a new PyTorch release is incompatible with the project's dependencies.Proposed fix — pin CPU PyTorch versions to match
:: Install PyTorch CPU version -uv pip install --index-url https://download.pytorch.org/whl/cpu torch torchvision torchaudio +uv pip install --index-url https://download.pytorch.org/whl/cpu torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0
1-97: Significant duplication withwindows_cuda_gpu.bat.Lines 1–83 are nearly identical to
install/CUDA/windows_cuda_gpu.bat. Consider extracting the shared Python-detection and uv-bootstrap logic into a common helper script (e.g.,install/common_setup.bat) and calling it from both scripts to reduce maintenance burden.src/modules/Pitcher/pitcher.py (1)
10-20: Global mutable state — not thread-safe.The lazy-initialized
_swift_f0_detectorglobal with no synchronization is fine for single-threaded use, but worth noting. If this module is ever used concurrently, a race condition could create multiple detector instances. For now this is acceptable given the application's single-threaded nature.ReleaseNotes.md (1)
17-19: Minor capitalization inconsistency.Line 19 starts with lowercase "upgrade" while all other entries in this section use a capitalized first word (e.g., "Changed", "Drop"). Consider:
Upgrade to Python 3.12.run_on_linux.sh (1)
22-22: Command-line arguments are not forwarded to UltraSinger.Users would expect to pass arguments like
-i "song.mp3"via this script. Currently no arguments are forwarded.Proposed fix
-python UltraSinger.py +python UltraSinger.py "$@"pytest/modules/Pitcher/test_pitcher.py (2)
12-12: Stale test method name.The method is still named
test_get_pitch_with_crepe_filebut now tests SwiftF0 viaget_pitch_with_file. Consider renaming totest_get_pitch_with_filefor clarity.
22-22: Remove commented-out CREPE call.This dead code is no longer relevant after the migration to SwiftF0.
run_on_windows.bat (1)
1-5: Behavioral inconsistency with Linux script and missing.venvcheck.Two concerns:
- Unlike
run_on_linux.sh, there's no check for.venvexistence. If it's missing,activate.batwill silently fail.- The Linux script auto-runs
python UltraSinger.py, but this script opens an interactive shell (cmd /k) requiring the user to type the command manually. If this is intentional, consider adding a note. Otherwise, consider aligning behavior:Proposed fix
`@echo` off -:: Activate the virtual environment and open cmd in the src directory -call .venv\Scripts\activate.bat -cd src -cmd /k +:: Activate the virtual environment and run UltraSinger +if not exist ".venv\Scripts\activate.bat" ( + echo Error: Virtual environment not found at .venv + echo Please run one of the installation scripts first. + pause + exit /b 1 +) +call .venv\Scripts\activate.bat +cd src +echo Starting UltraSinger... +python UltraSinger.py %* +pausecolab/UltraSinger.ipynb (1)
33-33: Colab clones the default branch — may not reflect latest changes.The notebook always clones from
main. Users testing unreleased changes (e.g., this PR's branch) would need to manually edit the clone command. Consider adding a comment noting how to switch branches.run_on_mac.command (1)
2-21: Consider addingset -eor guarding thecd srccall.If the
srcdirectory doesn't exist,cd srcwill fail silently andpython UltraSinger.pywill execute from the wrong directory (project root). The other install scripts in this PR useset -e. Also, consider forwarding arguments with"$@"topython UltraSinger.pyfor consistency with typical launchers.Proposed improvement
#!/bin/bash +set -e # Activate virtual environment and run UltraSinger on macOS SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" cd "$SCRIPT_DIR" ... cd src echo "Starting UltraSinger..." -python UltraSinger.py +python UltraSinger.py "$@"Dockerfile (1)
8-14: Add--no-install-recommendsto reduce image size.As flagged by static analysis, the
apt-get installcalls are missing--no-install-recommends, which pulls in unnecessary recommended packages and bloats the image.Proposed fix
RUN apt-get update \ - && apt-get install -y software-properties-common \ + && apt-get install -y --no-install-recommends software-properties-common \ && add-apt-repository ppa:deadsnakes/ppa -y \ && apt-get update \ - && apt-get install -y git python3.12 python3.12-venv python3.12-dev ffmpeg curl tzdata \ + && apt-get install -y --no-install-recommends git python3.12 python3.12-venv python3.12-dev ffmpeg curl tzdata \ && apt-get clean \ && rm -rf /var/lib/apt/lists/*src/modules/DeviceDetection/device_detection.py (1)
7-7: Module-levelpytorch_gpu_supportedis unused dead code.This global is shadowed by local variables in both
check_gpu_supportand__check_pytorch_support— neither function uses theglobalkeyword to read or write it.Proposed fix
-pytorch_gpu_supported = False - def check_gpu_support() -> str:src/UltraSinger.py (1)
790-793: Dead CLI options:--crepe,--crepe_step_size,--force_crepe_cpuare still parsed but no longer used.Since CREPE has been replaced by SwiftF0, these options and their corresponding
settingsfields are dead code. They can be cleaned up to avoid confusing users.Also applies to: 813-814
pyproject.toml (3)
2-3: Unnecessary build dependencies alongside hatchling.Since you've switched to
hatchling.buildas the backend and the version is now statically defined,setuptools,wheel, andsetuptools_scminrequiresare unnecessary. They add build-time overhead without being used.Suggested fix
-requires = ["hatchling", "setuptools", "wheel", "setuptools_scm"] +requires = ["hatchling"]
44-48: Empty platform optional-dependency groups.The
windows,linux, andmacosgroups are empty placeholders. If there are no platform-specific dependencies planned, consider removing them to avoid confusion. If platform-specific PyTorch index URLs are needed (e.g., CUDA on Linux/Windows vs. CPU on macOS), this is where they'd go.
48-51: Duplicatedevgroup in both[project.optional-dependencies]and[dependency-groups].
pytestappears in both[project.optional-dependencies] devand[dependency-groups] dev. Pick one mechanism. If usinguv,[dependency-groups](PEP 735) is the idiomatic choice.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@Dockerfile`:
- Line 1: The Dockerfile's base image FROM
nvidia/cuda:12.6.3-runtime-ubuntu22.04 conflicts with the PyTorch wheels built
for cu128 (CUDA 12.8); update the FROM line to a CUDA 12.8 runtime image (for
example nvidia/cuda:12.8.1-runtime-ubuntu22.04) to align runtimes, or
alternatively install PyTorch wheels built for cu126 to match the existing FROM
image—modify the Dockerfile's FROM directive accordingly so the container CUDA
runtime and chosen PyTorch wheel target match.
- Around line 21-29: The Dockerfile performs an editable PEP 660 install with
"RUN uv pip install --system --python 3.12 -e ." while only copying
pyproject.toml and creating an empty src/ directory; because pyproject.toml uses
hatchling with packages = ["src"], hatchling will fail to discover packages
without a package skeleton. Fix by either copying a minimal package skeleton
(e.g., add a src/__init__.py) into the image before the editable install or
change the build so you first install deps (pip install from pyproject) and then
run the editable install (uv pip install -e .) after copying the full source
tree; adjust the steps around COPY ./pyproject.toml, RUN mkdir -p
/app/UltraSinger/src, and the RUN uv pip install... line accordingly.
🧹 Nitpick comments (3)
Dockerfile (3)
8-14: Add--no-install-recommendsto reduce image size.Both
apt-get installinvocations pull in recommended packages that are unnecessary in a container. This can significantly bloat the image.♻️ Proposed fix
RUN apt-get update \ - && apt-get install -y software-properties-common \ + && apt-get install -y --no-install-recommends software-properties-common \ && add-apt-repository ppa:deadsnakes/ppa -y \ && apt-get update \ - && apt-get install -y git python3.12 python3.12-venv python3.12-dev ffmpeg curl tzdata \ + && apt-get install -y --no-install-recommends git python3.12 python3.12-venv python3.12-dev ffmpeg curl tzdata \ && apt-get clean \ && rm -rf /var/lib/apt/lists/*
16-18: Consider pinning theuvversion for reproducible builds.The install script fetches the latest
uvversion. A future breaking change could silently break the Docker build. You can pin a version via theUV_INSTALLER_VERSIONenv var or by appending a version to the URL (e.g.,https://astral.sh/uv/0.6.x/install.sh).
34-40: CombineCOPYandchownto save a layer and reduce build time.
chown -Ron line 39 creates a duplicate layer of the entire source tree. UsingCOPY --chowncombines both operations into a single layer.♻️ Proposed fix
# copy sources late to allow for caching of layers which contain all the dependencies -COPY . /app/UltraSinger - - -# no need to run as root -RUN chown -R 1000:1000 /app/UltraSinger +COPY --chown=1000:1000 . /app/UltraSinger + USER 1000:1000
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/modules/Speech_Recognition/Whisper.py (1)
163-196: 🛠️ Refactor suggestion | 🟠 MajorUse a
finallyblock instead of duplicatingtorch.loadrestoration in every handler.The restoration is repeated four times (lines 165, 170, 184, 191). More importantly,
BaseExceptionsubclasses likeKeyboardInterruptorSystemExitwill bypass allexceptblocks and leave the patch active.Suggested refactor with
finallytry: torch.cuda.empty_cache() loaded_whisper_model = whisperx.load_model( ... ) ... result_aligned = whisperx.align(...) transcribed_data = convert_to_transcribed_data(result_aligned) - # Restore original torch.load after models are loaded - torch.load = _original_torch_load return TranscriptionResult(transcribed_data, detected_language) except ValueError as value_error: - torch.load = _original_torch_load ... raise value_error except OutOfMemoryError as oom_exception: - torch.load = _original_torch_load ... raise oom_exception except Exception as exception: - torch.load = _original_torch_load ... raise exception + finally: + torch.load = _original_torch_load
🤖 Fix all issues with AI agents
In `@pyproject.toml`:
- Line 7: Update the version string in pyproject.toml to use canonical PEP 440
formatting: replace version = "0.0.13-dev14" with version = "0.0.13.dev14" so
the project version uses the dot-separated dev spec per PEP 440 normalization.
In `@src/modules/Speech_Recognition/Whisper.py`:
- Around line 12-22: The current module-level monkey-patch
(_original_torch_load, _patched_torch_load, torch.load = _patched_torch_load) is
correctly applied before importing whisperx but the restoration of torch.load is
duplicated inside multiple except handlers; change this to restore the original
loader in a single finally block that always executes—remove the torch.load =
_original_torch_load lines from the except handlers and add a finally:
torch.load = _original_torch_load after the try/except chain that surrounds
whisperx.load_model and the transcription logic (the block that returns
TranscriptionResult or re-raises ValueError / OutOfMemoryError / Exception).
In `@src/modules/Ultrastar/coverter/ultrastar_txt_converter.py`:
- Line 73: The ultrastar_txt.video assignment blindly concatenates basename and
media_info.video_extension which can be None; update the logic in
ultrastar_txt_converter.py to only set ultrastar_txt.video when
media_info.video_extension is truthy (non-None/non-empty) — otherwise leave
ultrastar_txt.video unset or None so the serializer omits the `#VIDEO` tag;
reference ultrastar_txt.video and media_info.video_extension to locate and
change the assignment.
- Around line 69-72: from_ultrastar_txt creates a MediaInfo without
audio_extension which results in filenames like "basename.None"; when
constructing the MediaInfo in from_ultrastar_txt set a sensible default (e.g.,
"mp3") for audio_extension (if audio_extension is None or falsy) so that
ultrastar_txt.mp3 / ultrastar_txt.audio / ultrastar_txt.vocals /
ultrastar_txt.instrumental produce valid filenames; update the MediaInfo
creation in from_ultrastar_txt (and any other code paths that create MediaInfo
from Ultrastar text, e.g., the export paths used by UltraSinger) to assign
audio_extension = "mp3" when none is provided.
In `@src/modules/Ultrastar/ultrastar_writer.py`:
- Line 56: The line calling file.write currently ends with a trailing comma
which turns the call into a discarded tuple (e.g., file.write(...),); remove the
trailing comma so the write call is executed normally. Update the occurrence
that uses UltrastarTxtTag.VERSION.value and ultrastar_class.version in
ultrastar_writer.py (the file.write(...) invocation) to drop the comma and
ensure the string is written to the file.
In `@src/UltraSinger.py`:
- Around line 670-673: The audio extension variable currently includes a leading
dot causing later code that does basename + "." + audio_extension to produce a
double-dot filename; update the block that sets
basename_with_ext/audio_ext/video_ext so audio_ext is stored without a leading
dot (use extension.lstrip('.')) and build basename_with_ext consistently (e.g.,
f"{basename_without_ext}.{audio_ext}") so downstream code (see
separate_audio_video usage and ultrastar_txt_converter.py filename construction)
produces correct single-dot filenames.
- Around line 520-521: When building output paths and calling
convert_audio_format, guard against process_data.media_info.audio_extension
being None: in the code that computes karaoke_output_path and the other output
path uses (symbols: karaoke_output_path, process_data.process_data_paths,
convert_audio_format, and media_info.audio_extension), check if
process_data.media_info.audio_extension is None and either (a) use a sensible
fallback extension (e.g., "mp3" or a constant DEFAULT_AUDIO_EXTENSION) before
concatenating, or (b) skip creating/ converting those files and log/raise a
clear error; apply the same guard wherever media_info.audio_extension is
concatenated for output paths so no TypeError occurs.
🧹 Nitpick comments (10)
.github/workflows/pylint.yml (1)
14-14: Python version inconsistency with other workflows.The rest of the PR standardizes on Python 3.12 (pytest.yml, main.yml, Dockerfile, pyproject.toml), but this workflow still uses
3.10. While the workflow is currently disabled (triggers onnone.txt), it would be good to align it now to avoid confusion if it's ever re-enabled..github/workflows/pytest.yml (1)
21-27: Consider pinning theuvversion for CI reproducibility.
curl -LsSf https://astral.sh/uv/install.sh | shinstalls the latestuv. A breaking change in a futureuvrelease could silently break CI. You can pin a version, e.g.:curl -LsSf https://astral.sh/uv/0.6.x/install.sh | sh(Same concern applies to the identical pattern in
main.yml.)Dockerfile (1)
8-15: Add--no-install-recommendsto reduce image size.The
apt-get installcalls pull in recommended packages that aren't needed in a container, bloating the image unnecessarily.Proposed fix
RUN apt-get update \ - && apt-get install -y software-properties-common \ + && apt-get install -y --no-install-recommends software-properties-common \ && add-apt-repository ppa:deadsnakes/ppa -y \ && apt-get update \ - && apt-get install -y git python3.12 python3.12-venv python3.12-dev ffmpeg curl tzdata \ - build-essential gcc g++ make \ + && apt-get install -y --no-install-recommends git python3.12 python3.12-venv python3.12-dev ffmpeg curl tzdata \ + build-essential gcc g++ make ca-certificates \src/modules/ffmpeg_helper.py (1)
94-97: Docstring is misleading — function returns only the extension, not the codec name.The docstring says "return codec name and appropriate file extension" but the function only returns the extension string.
Proposed fix
def get_audio_codec_and_extension(video_file_path: str) -> str: """ - Detect audio codec from video file and return codec name and appropriate file extension. + Detect audio codec from video file and return the appropriate file extension. """src/modules/Audio/youtube.py (1)
107-109:final_video_pathis unpacked but never used — prefix with underscore.Per Ruff RUF059,
final_video_pathis assigned but unused. Prefix it to signal intentional discard.Proposed fix
- audio_file_path, final_video_path, audio_ext, video_ext = separate_audio_video( + audio_file_path, _final_video_path, audio_ext, video_ext = separate_audio_video(pyproject.toml (2)
34-34:dataclassesdependency is unnecessary for Python ≥ 3.12.The
dataclassesmodule has been part of the standard library since Python 3.7. The PyPIdataclassespackage is a backport for 3.6 only. Sincerequires-python = ">=3.12", this dependency is dead weight."tqdm", "yt-dlp", "music21", - "dataclasses", "dataclasses-json",
41-48: Duplicate dev dependency definitions.
devis declared in both[project.optional-dependencies](line 45) and[dependency-groups](line 48) with identical content. Having both is confusing—pick one.[dependency-groups]is the newer PEP 735 mechanism;[project.optional-dependencies]is more widely supported.src/UltraSinger.py (2)
666-666: Unused variablefinal_video_path.Static analysis flags
final_video_pathas unused. Prefix with underscore to signal intent.- ultrastar_audio_input_path, final_video_path, audio_ext, video_ext = separate_audio_video( + ultrastar_audio_input_path, _final_video_path, audio_ext, video_ext = separate_audio_video(
793-797: Dead code:--crepeand--crepe_step_sizeCLI options are still parsed but CREPE has been removed.These options set
settings.crepe_model_capacityandsettings.crepe_step_size, but the pitch path now uses SwiftF0 exclusively. The options inarg_options()(lines 868-869) and the corresponding--force_crepe_cpuhandling (lines 817-818, 885) should also be cleaned up.src/modules/Ultrastar/ultrastar_writer.py (1)
86-91:#TAGSis written after#GAP— verify this matches the UltraStar format spec ordering expectations.The
TAGSfield is written betweenGAPandCREATOR. While the UltraStar format may not enforce strict tag ordering, it's worth confirming this placement is intentional and consistent with other tools' expectations.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/UltraSinger.py (1)
793-798:⚠️ Potential issue | 🟡 MinorDead CLI options:
--crepe,--crepe_step_size,--force_crepe_cpustill accepted.These flags set attributes that were removed from
Settings. They silently do nothing now. Remove the option handling and thearg_optionsentries to avoid confusing users.Suggested fix for arg_options
long = [ "ifile=", "ofile=", - "crepe=", - "crepe_step_size=", "demucs=", ... - "force_crepe_cpu", "format_version=",Also applies to: 817-818, 868-869, 885-885
src/Settings.py (1)
50-53:⚠️ Potential issue | 🟡 MinorRemove dead CREPE CLI option handling and clean up related code.
The
crepe_model_capacity,crepe_step_size, andforce_crepe_cpuattributes were removed fromSettings, but dead references remain across multiple files:
UltraSinger.py(lines 794–798, 817–818): CLI flags that set removed attributesUltraSinger.py(lines 868–869, 885): Dead entries inarg_options()init_interactive_mode.py(line 127): Setsforce_crepe_cpuattribute dynamicallycommon_print.py(lines 46–48, 69): Outdated help text documenting CREPE flagsSince
Settingsis non-frozen, these assignments silently create dynamic attributes that go unused, misleading users. Remove all CREPE flag handling and update documentation.
🤖 Fix all issues with AI agents
In `@pyproject.toml`:
- Around line 2-3: Remove the unnecessary build-time dependencies from
pyproject.toml by editing the requires array used with the build-backend
"hatchling.build": remove "setuptools" and "setuptools_scm" from the requires
list since hatchling is the specified build-backend and the project uses a
static version (no tool.setuptools_scm config), while "setuptools" remains
declared as a runtime dependency under [project] dependencies.
In `@src/UltraSinger.py`:
- Around line 707-714: The pitch cache branch is using the wrong flag: replace
the check of settings.skip_cache_transcription with the dedicated
settings.skip_cache_pitch_detection so the code that decides to call
get_pitch_with_file(process_data_paths.processing_audio_path) uses "if
settings.skip_cache_pitch_detection or not cache_available" (leave
pitching_config, pitched_data_path and check_file_exists as-is); this aligns the
skip behavior with the Settings flag that controls pitch detection caching.
🧹 Nitpick comments (5)
src/modules/Speech_Recognition/Whisper.py (1)
164-196: Redundanttorch.loadrestoration in eachexceptblock — thefinallyblock handles it.The
finallyblock at line 193 guarantees restoration regardless of how thetryexits, so the three identicaltorch.load = _original_torch_loadassignments in theexcepthandlers (lines 166, 180, 187) are dead code.Proposed cleanup
except ValueError as value_error: - # Restore original torch.load in case of error - torch.load = _original_torch_load - if ( "Requested float16 compute type, but the target device or backend do not support efficient float16 computation." in str(value_error.args[0]) ): print(value_error) print( f"{ULTRASINGER_HEAD} Your GPU does not support efficient float16 computation; run UltraSinger with '--whisper_compute_type int8'" ) - raise value_error except OutOfMemoryError as oom_exception: - # Restore original torch.load in case of error - torch.load = _original_torch_load - print(oom_exception) print(MEMORY_ERROR_MESSAGE) raise oom_exception except Exception as exception: - # Restore original torch.load in case of error - torch.load = _original_torch_load - if "CUDA failed with error out of memory" in str(exception.args[0]): print(exception) print(MEMORY_ERROR_MESSAGE) raise exception finally: # Restore original torch.load after models are loaded torch.load = _original_torch_loadsrc/modules/Ultrastar/coverter/ultrastar_txt_converter.py (1)
95-96: Clarify or remove the TODO.The
# todo: as list add here?is vague. If multi-key support is planned, consider opening an issue to track it rather than leaving an ambiguous TODO.pyproject.toml (2)
34-34:dataclassesis unnecessary for Python ≥ 3.12.The
dataclassesmodule has been part of the standard library since Python 3.7. The PyPI package is a backport for 3.6 only. Sincerequires-python = ">=3.12", this dependency is dead weight and can be removed.Suggested fix
"tqdm", "yt-dlp", "music21", - "dataclasses", "dataclasses-json",
42-44: Empty platform optional-dependency groups serve no purpose.
windows = [],linux = [],macos = []are placeholders with no dependencies. If platform-specific dependencies are planned, consider adding a TODO or removing these until needed.src/UltraSinger.py (1)
666-666: Unused variablefinal_video_path.
final_video_pathfromseparate_audio_videois unpacked but never used. Use_to signal intent.Suggested fix
- ultrastar_audio_input_path, final_video_path, audio_ext, video_ext = separate_audio_video( + ultrastar_audio_input_path, _, audio_ext, video_ext = separate_audio_video(
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Documentation
Tests