Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions beets/autotag/hooks.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@

from typing_extensions import Self

from beets import plugins
from beets.util import cached_classproperty

if TYPE_CHECKING:
Expand Down Expand Up @@ -58,6 +59,16 @@ def __hash__(self) -> int: # type: ignore[override]
class Info(AttrDict[Any]):
"""Container for metadata about a musical entity."""

Identifier = tuple[str | None, str | None]

@property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like that we raise an NotImplementedError here. We should make the Info class abstract or a protocol if want to define a contract for the inheritance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered using ABC + @abstractmethod here, but opted against it for these reasons:

  1. Limited scope: Info only has 2 concrete subclasses - it's not a public plugin interface where we need strict enforcement at instantiation time.

  2. Template pattern, not an interface: The base class provides real shared functionality (identifier property, __repr__, common __init__ parameters). The id and name properties are just internal adapters mapping to different field names in subclasses (e.g., album_id vs track_id).

  3. Testing overhead: Making it an ABC would require either creating stub implementations or monkeypatching __abstractmethods__ in tests. Since we're not exposing Info for external extension, the ceremony doesn't add value.

The NotImplementedError approach clearly documents "subclasses must override this" without the ABC machinery. If we later expose Info for some plugin extendability, I'd absolutely convert it to ABC at that point.

Happy to reconsider if you feel strongly about it though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Template pattern, not an interface: The base class provides real shared functionality (identifier property, repr, common init parameters). The id and name properties are just internal adapters mapping to different field names in subclasses (e.g., album_id vs track_id).

Abstract classes or protocols can also provide real shared functionality.

Testing overhead: Making it an ABC would require either creating stub implementations or monkeypatching abstractmethods in tests. Since we're not exposing Info for external extension, the ceremony doesn't add value.

We actually don’t construct raw Info instances in tests. Tests only ever instantiate AlbumInfo or TrackInfo directly. So converting Info to an ABC wouldn’t add test-work in practice.

The NotImplementedError approach clearly documents "subclasses must override this" without the ABC machinery. If we later expose Info for some plugin extendability, I'd absolutely convert it to ABC at that point

It looks like this usage has already begun showing up in plugins (mbpseudo.py is one example). Given that the system is already being extended, it might be safer to formalize the interface now rather than later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's some confusion here - mbpseudo.py doesn't subclass Info. It's a metadata source plugin that returns AlbumInfo/TrackInfo instances, but it doesn't create new Info subclasses.

On the other hand, discogs does - however, that approach is being reverted in #6179 since this subclass unintentionally introduced flexible attributes that ended up written into the database.

Given the above (that they actually should not be subclassed), I'd prefer to keep it simple with NotImplementedError for now. We can formalize it as an ABC if/when there's an actual need for plugin extensibility.

def id(self) -> str | None:
raise NotImplementedError

@property
def identifier(self) -> Identifier:
return (self.data_source, self.id)

@cached_property
def name(self) -> str:
raise NotImplementedError
Expand Down Expand Up @@ -103,6 +114,10 @@ class AlbumInfo(Info):
user items, and later to drive tagging decisions once selected.
"""

@property
def id(self) -> str | None:
return self.album_id

@cached_property
def name(self) -> str:
return self.album or ""
Expand Down Expand Up @@ -179,6 +194,10 @@ class TrackInfo(Info):
stand alone for singleton matching.
"""

@property
def id(self) -> str | None:
return self.track_id

@cached_property
def name(self) -> str:
return self.title or ""
Expand Down Expand Up @@ -247,6 +266,9 @@ class AlbumMatch(Match):
extra_items: list[Item]
extra_tracks: list[TrackInfo]

def __post_init__(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move the event trigger out of the class constructor. It is possible that an Match object is constructed independent of a beets pipeline. I do not want to couple this that strongly if not necessary.

E.g. we have some serialization logic in beets-flask and I don't want to trigger this whenever I load an Match entry for cold storage.

Copy link
Member Author

@snejus snejus Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the concern about this side effect but moving this to __post_init__ was intentional:

  1. The send("album_matched") call was duplicated across multiple sites in the autotagger - always right after creating AlbumMatch
  2. This guarantess that this event is sent on every AlbumMatch creation. This is especially relevant given that I'm currently refactoring this functionality extensively.

This is actually a textbook use of __post_init__ - PEP 557 explicitly recommends it for side effects that must always happen during initialization. The alternative (keeping manual emissions everywhere) was provably bug-prone.

Re: beets-flask serialization: AlbumMatch/TrackMatch are internal to beets' autotagger, not public plugin API. For deserialization, you could bypass __post_init__ with:

match = object.__new__(AlbumMatch)
match.__dict__.update(serialized_data)

Or even better, serialize just the match data rather than the objects themselves. If there's broader need for match serialization, we could discuss adding a proper public API for it.

I want to avoid blocking necessary refactoring of beets' internals based on downstream usage of internal classes. Does the deserialization workaround work for your use case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think the event should not be sent on every initialization. To me, the event is more closely coupled with the tagging logic than with the match object itself, although I do agree that the current approach makes the code a bit cleaner.

AlbumMatch/TrackMatch are internal to beets' autotagger, not public plugin API.

How does one identify public api in this case? We do not use __all__ and there is no underscore in the naming. As there is no internal use indicator, AlbumMatch/TrackMatch are public api according to pep-8.

Historically beets has treated only the plugin API as public, yes, but the project is also used as a library, and I think that use case deserves consideration as well. Without explicit boundaries, users reasonably assume that importable classes are fair game.

Or even better, serialize just the match data rather than the objects themselves. If there's broader need for match serialization, we could discuss adding a proper public API for it.

Just to clarify: I meant only deserialization. This was what the "load from cold storage" comment was referencing.

I want to avoid blocking necessary refactoring of beets' internals based on downstream usage of internal classes. Does the deserialization workaround work for your use case?

We routinely do block changes, or at least adjust them, because of potential downstream usage. Wanting to avoid that concern here feels a bit inconsistent.

The workaround would work, but it shifts the burden entirely onto downstream users and breaks existing programs.


I’m not opposed to the change in principle, and I don’t think it needs to block progress, but by any reasonable definition, this is a breaking change. In my view that implies either a major-version bump or, alternatively, introducing a minimal public deserialization format so maintainers can refactor freely without silently breaking consumers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair points - you're right about the API ambiguity and that beets is used as a library.

How about adding a from_dict() classmethod that skips the event?

@classmethod
def from_dict(cls, data: dict) -> AlbumMatch:
    """Reconstruct from serialized data without emitting events."""
    obj = object.__new__(cls)
    obj.__dict__.update(data)
    return obj

This gives library users a stable deserialization path, keeps the __post_init__ enforcement for the autotagger, and avoids the major version bump debate. Would that work for beets-flask?

plugins.send("album_matched", match=self)

@property
def item_info_pairs(self) -> list[tuple[Item, TrackInfo]]:
return list(self.mapping.items())
Expand Down
128 changes: 59 additions & 69 deletions beets/autotag/match.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,12 @@
from __future__ import annotations

from enum import IntEnum
from typing import TYPE_CHECKING, Any, NamedTuple, TypeVar
from typing import TYPE_CHECKING, NamedTuple, TypeVar

import lap
import numpy as np

from beets import config, logging, metadata_plugins, plugins
from beets import config, logging, metadata_plugins
from beets.autotag import AlbumInfo, AlbumMatch, TrackInfo, TrackMatch, hooks
from beets.util import get_most_common_tags

Expand All @@ -35,6 +35,11 @@

from beets.library import Item

from .hooks import Info

AnyMatch = TypeVar("AnyMatch", TrackMatch, AlbumMatch)
Candidates = dict[Info.Identifier, AnyMatch]

# Global logger.
log = logging.getLogger("beets")

Expand Down Expand Up @@ -98,28 +103,21 @@ def assign_items(
return list(mapping.items()), extra_items, extra_tracks


def match_by_id(items: Iterable[Item]) -> AlbumInfo | None:
"""If the items are tagged with an external source ID, return an
AlbumInfo object for the corresponding album. Otherwise, returns
None.
"""
albumids = (item.mb_albumid for item in items if item.mb_albumid)
def match_by_id(album_id: str | None, consensus: bool) -> Iterable[AlbumInfo]:
"""Return album candidates for the given album id.

# Did any of the items have an MB album ID?
try:
first = next(albumids)
except StopIteration:
Make sure that the ID is present and that there is consensus on it among
the items being tagged.
"""
if not album_id:
log.debug("No album ID found.")
return None
elif not consensus:
log.debug("No album ID consensus.")
else:
log.debug("Searching for discovered album ID: {}", album_id)
return metadata_plugins.albums_for_ids([album_id])

# Is there a consensus on the MB album ID?
for other in albumids:
if other != first:
log.debug("No album ID consensus.")
return None
# If all album IDs are equal, look up the album.
log.debug("Searching for discovered album ID: {}", first)
return metadata_plugins.album_for_id(first)
return ()


def _recommendation(
Expand Down Expand Up @@ -179,33 +177,33 @@ def _recommendation(
return rec


AnyMatch = TypeVar("AnyMatch", TrackMatch, AlbumMatch)


def _sort_candidates(candidates: Iterable[AnyMatch]) -> Sequence[AnyMatch]:
"""Sort candidates by distance."""
return sorted(candidates, key=lambda match: match.distance)


def _add_candidate(
items: Sequence[Item],
results: dict[Any, AlbumMatch],
results: Candidates[AlbumMatch],
info: AlbumInfo,
):
"""Given a candidate AlbumInfo object, attempt to add the candidate
to the output dictionary of AlbumMatch objects. This involves
checking the track count, ordering the items, checking for
duplicates, and calculating the distance.
"""
log.debug("Candidate: {0.artist} - {0.album} ({0.album_id})", info)
log.debug(
"Candidate: {0.artist} - {0.album} ({0.album_id}) from {0.data_source}",
info,
)

# Discard albums with zero tracks.
if not info.tracks:
log.debug("No tracks.")
return

# Prevent duplicates.
if info.album_id and info.album_id in results:
if info.identifier in results:
log.debug("Duplicate.")
return

Expand Down Expand Up @@ -233,7 +231,7 @@ def _add_candidate(
return

log.debug("Success. Distance: {}", dist)
results[info.album_id] = hooks.AlbumMatch(
results[info.identifier] = hooks.AlbumMatch(
dist, info, dict(item_info_pairs), extra_items, extra_tracks
)

Expand Down Expand Up @@ -268,38 +266,35 @@ def tag_album(
log.debug("Tagging {} - {}", cur_artist, cur_album)

# The output result, keys are the MB album ID.
candidates: dict[Any, AlbumMatch] = {}
candidates: Candidates[AlbumMatch] = {}

# Search by explicit ID.
if search_ids:
for search_id in search_ids:
log.debug("Searching for album ID: {}", search_id)
if info := metadata_plugins.album_for_id(search_id):
_add_candidate(items, candidates, info)
if opt_candidate := candidates.get(info.album_id):
plugins.send("album_matched", match=opt_candidate)
log.debug("Searching for album IDs: {}", search_ids)
for _info in metadata_plugins.albums_for_ids(search_ids):
_add_candidate(items, candidates, _info)

# Use existing metadata or text search.
else:
# Try search based on current ID.
if info := match_by_id(items):
for info in match_by_id(
likelies["mb_albumid"], consensus["mb_albumid"]
):
_add_candidate(items, candidates, info)
for candidate in candidates.values():
plugins.send("album_matched", match=candidate)

rec = _recommendation(list(candidates.values()))
log.debug("Album ID match recommendation is {}", rec)
if candidates and not config["import"]["timid"]:
# If we have a very good MBID match, return immediately.
# Otherwise, this match will compete against metadata-based
# matches.
if rec == Recommendation.strong:
log.debug("ID match.")
return (
cur_artist,
cur_album,
Proposal(list(candidates.values()), rec),
)

rec = _recommendation(list(candidates.values()))
log.debug("Album ID match recommendation is {}", rec)
if candidates and not config["import"]["timid"]:
# If we have a very good MBID match, return immediately.
# Otherwise, this match will compete against metadata-based
# matches.
if rec == Recommendation.strong:
log.debug("ID match.")
return (
cur_artist,
cur_album,
Proposal(list(candidates.values()), rec),
)

# Search terms.
if not (search_artist and search_name):
Expand All @@ -320,8 +315,6 @@ def tag_album(
items, search_artist, search_name, va_likely
):
_add_candidate(items, candidates, matched_candidate)
if opt_candidate := candidates.get(matched_candidate.album_id):
plugins.send("album_matched", match=opt_candidate)

log.debug("Evaluating {} candidates.", len(candidates))
# Sort and get the recommendation.
Expand All @@ -345,25 +338,22 @@ def tag_item(
"""
# Holds candidates found so far: keys are MBIDs; values are
# (distance, TrackInfo) pairs.
candidates = {}
candidates: Candidates[TrackMatch] = {}
rec: Recommendation | None = None

# First, try matching by the external source ID.
trackids = search_ids or [t for t in [item.mb_trackid] if t]
if trackids:
for trackid in trackids:
log.debug("Searching for track ID: {}", trackid)
if info := metadata_plugins.track_for_id(trackid):
dist = track_distance(item, info, incl_artist=True)
candidates[info.track_id] = hooks.TrackMatch(dist, info)
# If this is a good match, then don't keep searching.
rec = _recommendation(_sort_candidates(candidates.values()))
if (
rec == Recommendation.strong
and not config["import"]["timid"]
):
log.debug("Track ID match.")
return Proposal(_sort_candidates(candidates.values()), rec)
log.debug("Searching for track IDs: {}", trackids)
for info in metadata_plugins.tracks_for_ids(trackids):
dist = track_distance(item, info, incl_artist=True)
candidates[info.identifier] = hooks.TrackMatch(dist, info)

# If this is a good match, then don't keep searching.
rec = _recommendation(_sort_candidates(candidates.values()))
if rec == Recommendation.strong and not config["import"]["timid"]:
log.debug("Track ID match.")
return Proposal(_sort_candidates(candidates.values()), rec)

# If we're searching by ID, don't proceed.
if search_ids:
Expand All @@ -383,7 +373,7 @@ def tag_item(
item, search_artist, search_name
):
dist = track_distance(item, track_info, incl_artist=True)
candidates[track_info.track_id] = hooks.TrackMatch(dist, track_info)
candidates[track_info.identifier] = hooks.TrackMatch(dist, track_info)

# Sort by distance and return with recommendation.
log.debug("Found {} candidates.", len(candidates))
Expand Down
58 changes: 38 additions & 20 deletions beets/metadata_plugins.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,14 @@ def find_metadata_source_plugins() -> list[MetadataSourcePlugin]:
return [p for p in find_plugins() if hasattr(p, "data_source")] # type: ignore[misc]


@cache
def get_metadata_source(name: str) -> MetadataSourcePlugin | None:
"""Get metadata source plugin by name."""
name = name.lower()
plugins = find_metadata_source_plugins()
return next((p for p in plugins if p.data_source.lower() == name), None)


@notify_info_yielded("albuminfo_received")
def candidates(*args, **kwargs) -> Iterable[AlbumInfo]:
"""Return matching album candidates from all metadata source plugins."""
Expand All @@ -48,28 +56,38 @@ def item_candidates(*args, **kwargs) -> Iterable[TrackInfo]:
yield from plugin.item_candidates(*args, **kwargs)


def album_for_id(_id: str) -> AlbumInfo | None:
"""Get AlbumInfo object for the given ID string.
@notify_info_yielded("albuminfo_received")
def albums_for_ids(ids: Sequence[str]) -> Iterable[AlbumInfo]:
"""Return matching albums from all metadata sources for the given ID."""
for plugin in find_metadata_source_plugins():
yield from plugin.albums_for_ids(ids)

A single ID can yield just a single album, so we return the first match.
"""

@notify_info_yielded("trackinfo_received")
def tracks_for_ids(ids: Sequence[str]) -> Iterable[TrackInfo]:
"""Return matching tracks from all metadata sources for the given ID."""
for plugin in find_metadata_source_plugins():
if info := plugin.album_for_id(album_id=_id):
send("albuminfo_received", info=info)
return info
yield from plugin.tracks_for_ids(ids)

return None

def album_for_id(_id: str, data_source: str) -> AlbumInfo | None:
"""Get AlbumInfo object for the given ID and data source."""
if (plugin := get_metadata_source(data_source)) and (
info := plugin.album_for_id(_id)
):
send("albuminfo_received", info=info)
return info

return None

def track_for_id(_id: str) -> TrackInfo | None:
"""Get TrackInfo object for the given ID string.

A single ID can yield just a single track, so we return the first match.
"""
for plugin in find_metadata_source_plugins():
if info := plugin.track_for_id(_id):
send("trackinfo_received", info=info)
return info
def track_for_id(_id: str, data_source: str) -> TrackInfo | None:
"""Get TrackInfo object for the given ID and data source."""
if (plugin := get_metadata_source(data_source)) and (
info := plugin.track_for_id(_id)
):
send("trackinfo_received", info=info)
return info

return None

Expand Down Expand Up @@ -169,7 +187,7 @@ def item_candidates(
"""
raise NotImplementedError

def albums_for_ids(self, ids: Sequence[str]) -> Iterable[AlbumInfo | None]:
def albums_for_ids(self, ids: Sequence[str]) -> Iterable[AlbumInfo]:
"""Batch lookup of album metadata for a list of album IDs.

Given a list of album identifiers, yields corresponding AlbumInfo objects.
Expand All @@ -178,9 +196,9 @@ def albums_for_ids(self, ids: Sequence[str]) -> Iterable[AlbumInfo | None]:
single calls to album_for_id.
"""

return (self.album_for_id(id) for id in ids)
return filter(None, (self.album_for_id(id) for id in ids))

def tracks_for_ids(self, ids: Sequence[str]) -> Iterable[TrackInfo | None]:
def tracks_for_ids(self, ids: Sequence[str]) -> Iterable[TrackInfo]:
"""Batch lookup of track metadata for a list of track IDs.

Given a list of track identifiers, yields corresponding TrackInfo objects.
Expand All @@ -189,7 +207,7 @@ def tracks_for_ids(self, ids: Sequence[str]) -> Iterable[TrackInfo | None]:
single calls to track_for_id.
"""

return (self.track_for_id(id) for id in ids)
return filter(None, (self.track_for_id(id) for id in ids))

def _extract_id(self, url: str) -> str | None:
"""Extract an ID from a URL for this metadata source plugin.
Expand Down
Loading
Loading