Skip to content

Conversation

@slqy123
Copy link

@slqy123 slqy123 commented Jan 5, 2026

Currently, TextInputStream treats all text as raw bytes and does not strip UTF-8 BOM (EF BB BF). Cue sheets saved with UTF-8 BOM can fail to parse the first line because the first token is prefixed with BOM bytes.

This PR Strips UTF-8 BOM in TextInputStream constructor. It tries to read the first 3 bytes into buffer, and consume those bytes if a BOM is present.

@MaxKellermann
Copy link
Member

I don't think that's the right place to do this. This makes the constructor potentially blocking (and out of the caller's control), and the constructor can now throw on I/O error (which will crash MPD). I find it rather surprising that the constructor will implicitly do blocking I/O. Plus, it may deadlock if the caller happens to already hold a mutex lock.

@slqy123
Copy link
Author

slqy123 commented Jan 5, 2026

I don't think that's the right place to do this. This makes the constructor potentially blocking (and out of the caller's control), and the constructor can now throw on I/O error (which will crash MPD). I find it rather surprising that the constructor will implicitly do blocking I/O. Plus, it may deadlock if the caller happens to already hold a mutex lock.

Thank you for the feedback. I will move this to TextInputStream::ReadLine and add a flag to track if the BOM was checked.

@MaxKellermann
Copy link
Member

That's much better. Now please amend the commit message - it contains less than the PR text, and that's unfortunate, because PR text gets lost on GitHub's proprietary website.
If this fixes a problem users have currently, it should probably go to the stable branch v0.24.x, but I can easily cherry-pick it over there.

@slqy123
Copy link
Author

slqy123 commented Jan 5, 2026

If this fixes a problem users have currently, it should probably go to the stable branch v0.24.x, but I can easily cherry-pick it over there.

I didn't find an existing issue for this.
Since it only breaks the first line, most of the cuesheet still parses. it's easy to miss, but users are still affected. So I believe it should go to the stable branch.

@MaxKellermann
Copy link
Member

You edited the commit message from
"input/TextInputStream: strip UTF-8 BOM"
to
"input/TextInputStream: support UTF-8 BOM stripping in TextInputStream"

This is still

  • less information than the PR
  • not more than before; it just adds the (weasel) word "support" and mentions TextInputStream twice, but that doesn't add any information

Fixes parsing of cue sheets with UTF-8 BOM. The BOM is now detected
and consumed before the first line is parsed.
@slqy123
Copy link
Author

slqy123 commented Jan 7, 2026

You edited the commit message from "input/TextInputStream: strip UTF-8 BOM" to "input/TextInputStream: support UTF-8 BOM stripping in TextInputStream"

This is still

* less information than the PR

* not more than before; it just adds the (weasel) word "support" and mentions TextInputStream twice, but that doesn't add any information

OK, I have updated my commit message with a detailed explaination

@MaxKellermann
Copy link
Member

Cherry-picked to v0.24.x: 98bb249

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants