Skip to content

Conversation

@semohr
Copy link

@semohr semohr commented Jan 7, 2026

Depending on python version and hardware the io.DEFAULT_BUFFER_SIZE varies. We use this value to determine the length of blocks we ingest/feed to the chromaprint library. This can influence the generated fingerprints if the block sizes do not align with the number of max samples (which they hardly ever do).

Decision needed:
How do we want to approach this? The fix can be considered a bugfix but also breaking change. The fingerprints generated previously (using too much data from the last block) will not match the ones generated after the change. In my opinion this is a bug although fixing the bug might introduce issues for some users.

closes #89

we use to ingest/feed in fingerprinting can influence the generated
fingerprints.

This fixes the issue by only consuming the expected samples.
@grawlinson
Copy link

Package maintainer for Arch Linux here, if you do not fix it now, you're just kicking the can down the road. As there are already issues popping up and there will be more as Python 3.1{3,4} become more widespread. Just my 2 cents.

@snejus
Copy link
Member

snejus commented Jan 14, 2026

How does this affect matching fingerprints that have already been uploaded to AcousticBrainz?

@semohr
Copy link
Author

semohr commented Jan 14, 2026

It shouldnt matter too much, they will be different ofc. But comparing should be done with a distance measure anyways and results should be closer to the true distribution now. We should test this tho not sure how the implementation looks on their side.

Effectively only the last few bytes changed (because that's how the algorithm work) for long songs (bigger 120s).

The only issue I see is beets here: Im not sure how and where we do fingerpint comparisons but if they are equality checks and not a distance based with a cutoff, that will introduce issues.

As a side note, the fingerprints are still different to the ones generated by the cli fpcalc tool. But the average error in my local testing seem to be lower i.e. they are closer by distance.

As another note the fingerprints will also differ slightly if you use another spectrum extraction backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inconsistent fingerprints generated in Python 3.13 and 3.14

4 participants