|
| 1 | +# Compilation to Canonical Provider - Logic Documentation |
| 2 | + |
| 3 | +## Purpose |
| 4 | +The `CompilationToCanonicalProvider` is designed to suggest moving tracks from compilation albums (like "Greatest Hits", "Now That's What I Call Music", etc.) to their original studio albums. For example, if you have "Bohemian Rhapsody" scrobbled from "Greatest Hits", it should suggest moving it to "A Night at the Opera". |
| 5 | + |
| 6 | +## Current Implementation (Crude Heuristics) |
| 7 | + |
| 8 | +### How It Works |
| 9 | + |
| 10 | +1. **For each track being analyzed:** |
| 11 | + - Look up the track in MusicBrainz by artist + title |
| 12 | + - Get all releases (albums) that contain this recording |
| 13 | + - Filter the releases to find the "canonical" one |
| 14 | + |
| 15 | +2. **The "Canonical Release" Selection Process:** |
| 16 | + |
| 17 | + **Step 1: Filter OUT these types of releases:** |
| 18 | + - ❌ **Bootlegs** - Releases with `status = "Bootleg"` (unofficial recordings) |
| 19 | + - ❌ **Promotional** - Releases with `status = "Promotion"` |
| 20 | + - ❌ **Pseudo-releases** - Releases with `status = "PseudoRelease"` |
| 21 | + - ❌ **Live albums** - Albums with titles containing "live at", "live in", "concert", "unplugged", etc. |
| 22 | + - ❌ **Compilations** - Albums with titles containing: |
| 23 | + - "greatest", "best of", "collection", "essential" |
| 24 | + - "anthology", "ultimate", "hits", "singles" |
| 25 | + - "soundtrack", "ost", "various artists" |
| 26 | + - ❌ **Singles** - Releases where the title matches the track name |
| 27 | + - ❌ **Various Artists releases** - Where artist credit is "Various Artists", "VA", etc. |
| 28 | + |
| 29 | + **Step 2: From remaining releases, pick the EARLIEST by release date** |
| 30 | + |
| 31 | +3. **Only suggest a change if:** |
| 32 | + - A canonical release was found |
| 33 | + - It's different from the current album |
| 34 | + |
| 35 | +## Problems with This Approach |
| 36 | + |
| 37 | +### 1. Title-Based Heuristics Are Unreliable |
| 38 | +- **False Positives:** "The Beatles Box" is detected as non-compilation (doesn't contain our keywords) |
| 39 | +- **False Negatives:** An album called "Live and Let Die" might be filtered as a live album |
| 40 | +- **Language Issues:** Non-English compilations like "Grandes Éxitos" won't be detected |
| 41 | + |
| 42 | +### 2. "Earliest Release" Doesn't Mean "Original Album" |
| 43 | +The current logic assumes the earliest release is the original, but this fails for: |
| 44 | +- **Reissued singles** that predate the album |
| 45 | +- **Regional releases** (Japanese release might be earlier but not canonical) |
| 46 | +- **Box sets** released early in an artist's career |
| 47 | +- **Other compilations** that don't match our keyword list |
| 48 | + |
| 49 | +### 3. MusicBrainz Data Limitations |
| 50 | +- **Missing release status:** Many releases don't have status field populated |
| 51 | +- **No compilation flag:** MusicBrainz doesn't directly mark compilations vs studio albums |
| 52 | +- **Release group info needed:** The real solution requires fetching release groups (additional API calls) |
| 53 | + |
| 54 | +## What We SHOULD Be Doing |
| 55 | + |
| 56 | +### Proper MusicBrainz Approach |
| 57 | + |
| 58 | +1. **Use Release Groups:** |
| 59 | + ``` |
| 60 | + Recording -> Release Group -> Primary Type |
| 61 | + ``` |
| 62 | + - Release groups have a `primary-type` field: "Album", "Single", "EP", "Compilation", "Soundtrack", "Live" |
| 63 | + - This would definitively identify compilations |
| 64 | + |
| 65 | +2. **Use Secondary Types:** |
| 66 | + - Release groups also have `secondary-types` like "Compilation", "Live", "Soundtrack", "Remix" |
| 67 | + - Much more reliable than title parsing |
| 68 | + |
| 69 | +3. **Respect Artist Intent:** |
| 70 | + - Some "greatest hits" are considered canonical by the artist |
| 71 | + - Some tracks were only released on compilations |
| 72 | + - Need to handle these edge cases |
| 73 | + |
| 74 | +### Example of Current Failures |
| 75 | + |
| 76 | +**Input:** "Come Together" by The Beatles from "Abbey Road" (1969) |
| 77 | +**Current Output:** Suggests "1962–1970: The Best Of" because: |
| 78 | +- It's not detected as a compilation (no keywords match) |
| 79 | +- It has an earlier date in the database |
| 80 | +- Our heuristics fail |
| 81 | + |
| 82 | +**Expected:** No suggestion (already on the original album) |
| 83 | + |
| 84 | +## Short-term Improvements (Without API Changes) |
| 85 | + |
| 86 | +1. **Expand keyword lists** for compilation detection |
| 87 | +2. **Check album artist** - if different from track artist, likely compilation |
| 88 | +3. **Track count heuristic** - compilations often have 20+ tracks from different albums |
| 89 | +4. **Year span check** - if album contains tracks spanning many years, likely compilation |
| 90 | +5. **Whitelist known studio albums** for major artists |
| 91 | + |
| 92 | +## Long-term Solution |
| 93 | + |
| 94 | +1. **Fetch release group data** from MusicBrainz |
| 95 | +2. **Use proper type fields** instead of title heuristics |
| 96 | +3. **Cache release group lookups** to minimize API calls |
| 97 | +4. **Build a learning system** that improves based on user corrections |
| 98 | +5. **Allow user-defined rules** for specific artists/albums |
| 99 | + |
| 100 | +## Current Workarounds |
| 101 | + |
| 102 | +The provider is still useful for obvious cases: |
| 103 | +- ✅ "Now That's What I Call Music" -> Original albums |
| 104 | +- ✅ "Greatest Hits" -> Original albums (when keyword matches) |
| 105 | +- ✅ Soundtracks -> Original albums |
| 106 | +- ❌ Edge cases and non-English compilations |
| 107 | +- ❌ Box sets and reissues |
| 108 | + |
| 109 | +## Conclusion |
| 110 | + |
| 111 | +The current implementation uses crude pattern matching on album titles and basic date sorting. While it works for obvious compilations, it fails on edge cases and can suggest inappropriate moves (like from studio album to a different compilation). The proper solution requires using MusicBrainz's release group types, but that would require additional API calls and significant refactoring. |
0 commit comments