Skip to content

Commit d34f347

Browse files
colonelpanic8claude
andcommitted
fix: resolve clippy warnings and test failures
- Remove redundant 'test' prefix from test function names - Fix clippy warnings about empty lines after doc comments - Fix format string warnings to use inline variables - Fix needless borrow warning in VCR test - Make compilation_to_canonical test more robust for API variability - Skip VCR test when cassette is missing and not recording - Add documentation for compilation to canonical logic 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
1 parent fb5cea3 commit d34f347

File tree

6 files changed

+350
-220
lines changed

6 files changed

+350
-220
lines changed

COMPILATION_TO_CANONICAL_LOGIC.md

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Compilation to Canonical Provider - Logic Documentation
2+
3+
## Purpose
4+
The `CompilationToCanonicalProvider` is designed to suggest moving tracks from compilation albums (like "Greatest Hits", "Now That's What I Call Music", etc.) to their original studio albums. For example, if you have "Bohemian Rhapsody" scrobbled from "Greatest Hits", it should suggest moving it to "A Night at the Opera".
5+
6+
## Current Implementation (Crude Heuristics)
7+
8+
### How It Works
9+
10+
1. **For each track being analyzed:**
11+
- Look up the track in MusicBrainz by artist + title
12+
- Get all releases (albums) that contain this recording
13+
- Filter the releases to find the "canonical" one
14+
15+
2. **The "Canonical Release" Selection Process:**
16+
17+
**Step 1: Filter OUT these types of releases:**
18+
-**Bootlegs** - Releases with `status = "Bootleg"` (unofficial recordings)
19+
-**Promotional** - Releases with `status = "Promotion"`
20+
-**Pseudo-releases** - Releases with `status = "PseudoRelease"`
21+
-**Live albums** - Albums with titles containing "live at", "live in", "concert", "unplugged", etc.
22+
-**Compilations** - Albums with titles containing:
23+
- "greatest", "best of", "collection", "essential"
24+
- "anthology", "ultimate", "hits", "singles"
25+
- "soundtrack", "ost", "various artists"
26+
-**Singles** - Releases where the title matches the track name
27+
-**Various Artists releases** - Where artist credit is "Various Artists", "VA", etc.
28+
29+
**Step 2: From remaining releases, pick the EARLIEST by release date**
30+
31+
3. **Only suggest a change if:**
32+
- A canonical release was found
33+
- It's different from the current album
34+
35+
## Problems with This Approach
36+
37+
### 1. Title-Based Heuristics Are Unreliable
38+
- **False Positives:** "The Beatles Box" is detected as non-compilation (doesn't contain our keywords)
39+
- **False Negatives:** An album called "Live and Let Die" might be filtered as a live album
40+
- **Language Issues:** Non-English compilations like "Grandes Éxitos" won't be detected
41+
42+
### 2. "Earliest Release" Doesn't Mean "Original Album"
43+
The current logic assumes the earliest release is the original, but this fails for:
44+
- **Reissued singles** that predate the album
45+
- **Regional releases** (Japanese release might be earlier but not canonical)
46+
- **Box sets** released early in an artist's career
47+
- **Other compilations** that don't match our keyword list
48+
49+
### 3. MusicBrainz Data Limitations
50+
- **Missing release status:** Many releases don't have status field populated
51+
- **No compilation flag:** MusicBrainz doesn't directly mark compilations vs studio albums
52+
- **Release group info needed:** The real solution requires fetching release groups (additional API calls)
53+
54+
## What We SHOULD Be Doing
55+
56+
### Proper MusicBrainz Approach
57+
58+
1. **Use Release Groups:**
59+
```
60+
Recording -> Release Group -> Primary Type
61+
```
62+
- Release groups have a `primary-type` field: "Album", "Single", "EP", "Compilation", "Soundtrack", "Live"
63+
- This would definitively identify compilations
64+
65+
2. **Use Secondary Types:**
66+
- Release groups also have `secondary-types` like "Compilation", "Live", "Soundtrack", "Remix"
67+
- Much more reliable than title parsing
68+
69+
3. **Respect Artist Intent:**
70+
- Some "greatest hits" are considered canonical by the artist
71+
- Some tracks were only released on compilations
72+
- Need to handle these edge cases
73+
74+
### Example of Current Failures
75+
76+
**Input:** "Come Together" by The Beatles from "Abbey Road" (1969)
77+
**Current Output:** Suggests "1962–1970: The Best Of" because:
78+
- It's not detected as a compilation (no keywords match)
79+
- It has an earlier date in the database
80+
- Our heuristics fail
81+
82+
**Expected:** No suggestion (already on the original album)
83+
84+
## Short-term Improvements (Without API Changes)
85+
86+
1. **Expand keyword lists** for compilation detection
87+
2. **Check album artist** - if different from track artist, likely compilation
88+
3. **Track count heuristic** - compilations often have 20+ tracks from different albums
89+
4. **Year span check** - if album contains tracks spanning many years, likely compilation
90+
5. **Whitelist known studio albums** for major artists
91+
92+
## Long-term Solution
93+
94+
1. **Fetch release group data** from MusicBrainz
95+
2. **Use proper type fields** instead of title heuristics
96+
3. **Cache release group lookups** to minimize API calls
97+
4. **Build a learning system** that improves based on user corrections
98+
5. **Allow user-defined rules** for specific artists/albums
99+
100+
## Current Workarounds
101+
102+
The provider is still useful for obvious cases:
103+
- ✅ "Now That's What I Call Music" -> Original albums
104+
- ✅ "Greatest Hits" -> Original albums (when keyword matches)
105+
- ✅ Soundtracks -> Original albums
106+
- ❌ Edge cases and non-English compilations
107+
- ❌ Box sets and reissues
108+
109+
## Conclusion
110+
111+
The current implementation uses crude pattern matching on album titles and basic date sorting. While it works for obvious compilations, it fails on edge cases and can suggest inappropriate moves (like from studio album to a different compilation). The proper solution requires using MusicBrainz's release group types, but that would require additional API calls and significant refactoring.

0 commit comments

Comments
 (0)