Skip to content

Parse Copyright Notices

David Kellner edited this page Jan 14, 2022 · 58 revisions

Description

The userscript provides a text parser in the release relationship editor where you can paste credits or load the contents of the release annotation. It tries to extract copyright and legal information from the text input and assists the user to create relationships for these.

Relationships will be added at release level by default. Additionally you can create phonographic copyright relationships at recording level by ticking the checkboxes of the desired recordings.

The parser generally assumes that a copyright holder is a label entity. It automatically opens the appropriate auto-complete dialog for all unknown names and asks the user to select or create the correct entity. Confirming the dialog creates the relationship and lets the parser continue with the next credit (cancelling the dialog skips the creation of a relationship for the current credit).

Once the user has selected a match for a given name, the userscript caches the MBID of this match and will not ask the user to match the same name again.

Successfully parsed credit lines will be appended to the edit note, optionally they can also be removed from the input so that only the skipped lines remain.

Supported formats

If you want to know the exact details, have a look at the underlying regular expressions. You can find lots of tools which can explain them to you (e.g. https://regex101.com) or you can study this beautiful railroad diagram representation (where I have combined and annotated the expressions).

Collection of unsupported copyright notice formats

Entries for formats, which had caused issues previously, but are supported by now, have been ticked off in this list and added to the test cases.

The major problem is that the userscript has to reliably detect the end of the copyright holder's name. For the easy cases that was just a comma or a full stop, but we also need a special handling for company suffixes after a comma and/or dots which are part of the company suffix.

Version 2022.1.11 now detects "Inc." and "Ltd." (also without trailing dot), "LLC", "LLP", and " under " (for "X under exclusive license to Y") in addition to comma and full stop. Please let me know if you find more patterns which end the name of a copyright holder.

Types

  • licensed to / licenced to / under exclusive license to / under exclusive licence to

    • This one didn't work as far as adding the licensed to, it only worked for (P) & (C):

      ℗ & © «2016 Maspeth Music BV, under exclusive license to Republic Records, a division of UMG Recordings, Inc. (Eddie O Ent.)»

      • only matched "licensed to"
    • Doesn't add licensed to, but did add (P) & (C):

      © 2021 SSA Recording, LLP, under exclusive license to Republic Records, a division of UMG Recordings, Inc. ℗ 2021 SSA Recording, LLP, under exclusive license to Republic Records, a division of UMG Recordings, Inc.

    • Same:

      ℗ «2021 SSA Recording, LLP, under exclusive license to Republic Records, a division of UMG Recordings, Inc.»

  • distributed by / distributor (unsupported)

    • HD Tracks API: Worked for (P) & (C), but cut off "Inc. and only searched for "The Weeknd XO". Distributed by didn't search at all.

      "pLine": "Distributed By Republic Records.; ℗ 2011 The Weeknd XO, Inc.", "cLine": "© 2011 The Weeknd XO, Inc.",

    • HDTracks API distributor line doesn't work at all (not that I expected it too).

      "distributor": "Universal Music Group

      • currently only expected to work for copyright notices, but I think it should be doable to support this now that "distributed by" is actually supported
  • marketed and distributed by (multiple types)

    • Skips marketed by, but it did add distributed by credit with no problem:

      marketed and distributed by Sony Music Entertainment

Company suffixes

  • LLP

    • I just noticed that on SSA Recording, LLP, that only SSA Recording is going into the search field. The LLP is being chopped off.
      • a comma was interpreted as end of the name
  • Inc / Inc.

    • HD Tracks API: Worked for (P) & (C), but cut off "Inc. and only searched for "The Weeknd XO". Distributed by didn't search at all.

      "pLine": "Distributed By Republic Records.; ℗ 2011 The Weeknd XO, Inc.", "cLine": "© 2011 The Weeknd XO, Inc.",

Other critical suffixes which could possibly occur (i.e. containing dots or prefixed by a comma):

  • B.V.
  • S.A.

Separators (terminate a copyright holder's name)

  • All of "Magic Quid Limited under exclusive licence to BMG Rights Management (UK) Limited" went into search and credited as.

    ℗ & © «2019 Magic Quid Limited under exclusive licence to BMG Rights Management (UK) Limited»

    • only comma and full stop were interpreted as end of the name

Other formats

  • Didn't work without any dates:

    ℗ & © «Rare»

  • Phonographic copyright doesn't work if preceded by release label. On UMG releases this will be most of them. Did work on the copyright.

    ℗Motown Records; 2021 UMG Recordings, Inc. © 2021 UMG Recordings, Inc.

    • Is there always a semicolon? It would be easy to skip the release label part in that case. Done: Skip based on the presence of the semicolon.
    • Answer: Yeah. Most of the time I think there is a semicolon. I saw some of the working now, so looks like you fixed it.
  • Doesn't recognize multi label splits as separate releases. Shows as "Shady Records/Aftermath Records/Interscope Records" on search and credited as. Maybe "/" should be treated as a stop, but then there are a few labels where that's part of the name. In this case, it's 3 labels.

    ℗ «2012 Shady Records/Aftermath Records/Interscope Records»

    • I could split names at slashes if that case is more common than label names which actually contain slashes. Do you have any other examples or perceived statistics?
      • Universal Music A/S
    • Decided to split only if the resulting parts have at least two word characters to avoid splitting company suffixes like A/S, other cases should be rare and it's easier to cancel unwanted additional rel dialogs than to add those that were missed.
  • Found new common combination that doesn't work well right now for obvious reasons. It actually only searches "Warner Music Nashville LLC for the U" because of the period and then never even looks for WEA International Inc.

    ℗ & © «2020 Warner Music Nashville LLC for the U.S. and WEA International Inc. for the world outside the U.S.»

  • Adds "The copyright in this sound recording is owned by Pink Floyd Music Ltd." to search and skips marketed by, but it did add distributed by credit with no problem:

    ℗ «2016 The copyright in this sound recording is owned by Pink Floyd Music Ltd., marketed and distributed by Sony Music Entertainment»

  • Copyright holder is prefixed by "The copyright in this compilation is owned by"

    ℗ «2016 The copyright in this compilation is owned by Pink Floyd Music Ltd., marketed and distributed by Sony Music Entertainment»

  • Misses "Pink Floyd (1987) Ltd." on both. Maybe the space before & after "/" messes it up.

    © 2016 Pink Floyd Music Ltd. / Pink Floyd (1987) Ltd. ℗ 2016 Pink Floyd Music Ltd. / Pink Floyd (1987) Ltd., marketed and distributed by Parlophone Records Ltd., a Warner Music Group Company

    • The dot of Ltd. is interpreted as name terminator so it does not even look for the following slash.

Clone this wiki locally