Support UTF-8 BOM stripping in `TextInputStream` #2408

slqy123 · 2026-01-05T09:58:08Z

Currently, TextInputStream treats all text as raw bytes and does not strip UTF-8 BOM (EF BB BF). Cue sheets saved with UTF-8 BOM can fail to parse the first line because the first token is prefixed with BOM bytes.

This PR Strips UTF-8 BOM in TextInputStream constructor. It tries to read the first 3 bytes into buffer, and consume those bytes if a BOM is present.

MaxKellermann · 2026-01-05T10:04:35Z

I don't think that's the right place to do this. This makes the constructor potentially blocking (and out of the caller's control), and the constructor can now throw on I/O error (which will crash MPD). I find it rather surprising that the constructor will implicitly do blocking I/O. Plus, it may deadlock if the caller happens to already hold a mutex lock.

slqy123 · 2026-01-05T10:17:48Z

I don't think that's the right place to do this. This makes the constructor potentially blocking (and out of the caller's control), and the constructor can now throw on I/O error (which will crash MPD). I find it rather surprising that the constructor will implicitly do blocking I/O. Plus, it may deadlock if the caller happens to already hold a mutex lock.

Thank you for the feedback. I will move this to TextInputStream::ReadLine and add a flag to track if the BOM was checked.

MaxKellermann · 2026-01-05T12:00:20Z

That's much better. Now please amend the commit message - it contains less than the PR text, and that's unfortunate, because PR text gets lost on GitHub's proprietary website.
If this fixes a problem users have currently, it should probably go to the stable branch v0.24.x, but I can easily cherry-pick it over there.

slqy123 · 2026-01-05T12:55:07Z

If this fixes a problem users have currently, it should probably go to the stable branch v0.24.x, but I can easily cherry-pick it over there.

I didn't find an existing issue for this.
Since it only breaks the first line, most of the cuesheet still parses. it's easy to miss, but users are still affected. So I believe it should go to the stable branch.

MaxKellermann · 2026-01-07T10:33:32Z

You edited the commit message from
"input/TextInputStream: strip UTF-8 BOM"
to
"input/TextInputStream: support UTF-8 BOM stripping in TextInputStream"

This is still

less information than the PR
not more than before; it just adds the (weasel) word "support" and mentions TextInputStream twice, but that doesn't add any information

Fixes parsing of cue sheets with UTF-8 BOM. The BOM is now detected and consumed before the first line is parsed.

slqy123 · 2026-01-07T12:51:39Z

You edited the commit message from "input/TextInputStream: strip UTF-8 BOM" to "input/TextInputStream: support UTF-8 BOM stripping in TextInputStream"

This is still
* less information than the PR

* not more than before; it just adds the (weasel) word "support" and mentions TextInputStream twice, but that doesn't add any information

OK, I have updated my commit message with a detailed explaination

MaxKellermann · 2026-01-07T13:07:22Z

Cherry-picked to v0.24.x: 98bb249

slqy123 force-pushed the strip_utf8_bom branch from d7def8e to 2a7f3e7 Compare January 5, 2026 10:29

slqy123 force-pushed the strip_utf8_bom branch from 2a7f3e7 to 1dd6f20 Compare January 5, 2026 12:34

input/TextInputStream: Strip UTF-8 BOM on first ReadLine()

e39d924

Fixes parsing of cue sheets with UTF-8 BOM. The BOM is now detected and consumed before the first line is parsed.

slqy123 force-pushed the strip_utf8_bom branch from 1dd6f20 to e39d924 Compare January 7, 2026 12:43

MaxKellermann closed this Jan 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support UTF-8 BOM stripping in `TextInputStream` #2408

Support UTF-8 BOM stripping in `TextInputStream` #2408

slqy123 commented Jan 5, 2026

Uh oh!

MaxKellermann commented Jan 5, 2026

Uh oh!

slqy123 commented Jan 5, 2026

Uh oh!

MaxKellermann commented Jan 5, 2026

Uh oh!

slqy123 commented Jan 5, 2026

Uh oh!

MaxKellermann commented Jan 7, 2026

Uh oh!

slqy123 commented Jan 7, 2026

Uh oh!

MaxKellermann commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support UTF-8 BOM stripping in TextInputStream #2408

Support UTF-8 BOM stripping in TextInputStream #2408

Conversation

slqy123 commented Jan 5, 2026

Uh oh!

MaxKellermann commented Jan 5, 2026

Uh oh!

slqy123 commented Jan 5, 2026

Uh oh!

MaxKellermann commented Jan 5, 2026

Uh oh!

slqy123 commented Jan 5, 2026

Uh oh!

MaxKellermann commented Jan 7, 2026

Uh oh!

slqy123 commented Jan 7, 2026

Uh oh!

MaxKellermann commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support UTF-8 BOM stripping in `TextInputStream` #2408

Support UTF-8 BOM stripping in `TextInputStream` #2408