-
-
Notifications
You must be signed in to change notification settings - Fork 55
Description
Reproducer
1. Create the Directory Structure
In your terminal, create a directory path that includes the special U+2010 hyphen. Note: This character looks identical to a normal hyphen but is a different Unicode code point. You can copy-paste the command below.
# Create a directory named "Low‐Life" where the hyphen is the Unicode character U+2010
mkdir -p "/tmp/music-test/Low‐Life"2. Add a Sample File
Place any valid FLAC file into this directory. The contents and tags of the file do not matter.
Copy a known-good FLAC file: cp "/path/to/any/good.flac" "/tmp/music-test/Low‐Life/test.flac"
# Or for example, create a dummy FLAC file for testing
# To create a dummy file if you don't have one:
dd if=/dev/zero bs=1k count=44 | flac - -o "/tmp/music-test/Low‐Life/test.flac"3. Use the following Rust code
- Cargo.toml:
[dependencies] lofty = "*"
- src/main.rs:
use std::{borrow::Cow, error::Error, future::Future, path::Path, pin::Pin, rc::Rc, sync::Arc}; use std::cell::{Cell, RefCell}; use glib::MainContext; use gtk4::Label; use libadwaita::ViewStack; use libadwaita::prelude::WidgetExt; use lofty::probe::Probe; use lofty::prelude::{Accessor, AudioFile, TaggedFileExt, ItemKey}; use lofty::tag::items::Timestamp; use regex::Regex; use sqlx::{query, Row, SqlitePool}; use tokio::fs::{File, read_dir}; use tokio::io::{AsyncBufReadExt, BufReader}; use tokio::sync::mpsc::{UnboundedReceiver, UnboundedSender}; ... ... /// Process a single audio file: extract tags, handle missing metadata, and insert into the database. async fn process_file( pool: &SqlitePool, path: &Path, folder_id: i64, dr_value: Option<u8>, ) -> Result<(), Box<dyn Error>> { // Probe the file to read metadata. If it fails, we can't process it. let tagged_file = match Probe::open(path) { Ok(probe) => match probe.read() { Ok(tf) => tf, Err(e) => { eprintln!("Error reading metadata for {:?}: {}. Skipping.", path, e); return Ok(()); } }, Err(e) => { eprintln!("Error opening file {:?}: {}. Skipping.", path, e); return Ok(()); } }; let tag = tagged_file.primary_tag(); // This is an Option<&Tag> let properties = tagged_file.properties();
Then I see:
Error reading metadata for "/home/arch/Musik/New Order/Low-Life/01 Love Vigilantes.flac": Text decoding: UTF-16 string has an invalid byte order mark. Skipping.
Error reading metadata for "/home/arch/Musik/New Order/Low-Life/02 The Perfect Kiss.flac": Text decoding: UTF-16 string has an invalid byte order mark. Skipping.
Error reading metadata for "/home/arch/Musik/New Order/Low-Life/03 This Time of Night.flac": Text decoding: UTF-16 string has an invalid byte order mark. Skipping.
Error reading metadata for "/home/arch/Musik/New Order/Low-Life/04 Sunrise.flac": Text decoding: UTF-16 string has an invalid byte order mark. Skipping.
Error reading metadata for "/home/arch/Musik/New Order/Low-Life/05 Elegia.flac": Text decoding: UTF-16 string has an invalid byte order mark. Skipping.
Error reading metadata for "/home/arch/Musik/New Order/Low-Life/06 Sooner Than You Think.flac": Text decoding: UTF-16 string has an invalid byte order mark. Skipping.
Error reading metadata for "/home/arch/Musik/New Order/Low-Life/07 Sub-Culture.flac": Text decoding: UTF-16 string has an invalid byte order mark. Skipping.
Error reading metadata for "/home/arch/Musik/New Order/Low-Life/08 Face Up.flac": Text decoding: UTF-16 string has an invalid byte order mark. Skipping.
Summary
lofty fails to parse valid FLAC files when their directory path contains a Unicode Hyphen (U+2010), producing a misleading error, Text decoding: UTF-16 string has an invalid byte order mark, which incorrectly points to file metadata corruption instead of the path issue.
Expected behavior
I expect lofty to successfully open and parse the metadata of any valid FLAC file, regardless of the Unicode characters present in its file path. The program should run and not print any error.
Assets
No specific sample file is required, as the bug is path-dependent, not file-dependent. Any valid FLAC file will trigger the bug when placed in the described directory.
The critical asset is the Unicode character itself:
- Problematic Character:
‐(U+2010, HYPHEN) - Normal Character:
-(U+002D, HYPHEN-MINUS)