deezer
diff --git a/‎_posts/2025-06-27-acl-mfrohmann‎
Lines changed: 23 additions & 0 deletions b/‎_posts/2025-06-27-acl-mfrohmann‎
Lines changed: 23 additions & 0 deletions
diff --git a/‎_posts/2025-06-25-ismir-dafchar‎ ‎_posts/2025-09-25-ismir-dafchar‎_posts/2025-06-25-ismir-dafchar renamed to _posts/2025-09-25-ismir-dafchar b/‎_posts/2025-06-25-ismir-dafchar‎ ‎_posts/2025-09-25-ismir-dafchar‎_posts/2025-06-25-ismir-dafchar renamed to _posts/2025-09-25-ismir-dafchar
diff --git a/‎_posts/2025-06-25-ismir-mfrohmann‎ ‎_posts/2025-09-25-ismir-mfrohmann‎_posts/2025-06-25-ismir-mfrohmann renamed to _posts/2025-09-25-ismir-mfrohmann
Lines changed: 1 addition & 1 deletion b/‎_posts/2025-06-25-ismir-mfrohmann‎ ‎_posts/2025-09-25-ismir-mfrohmann‎_posts/2025-06-25-ismir-mfrohmann renamed to _posts/2025-09-25-ismir-mfrohmann
Lines changed: 1 addition & 1 deletion
diff --git a/‎_posts/2025-06-25-ismir-ykong‎ ‎_posts/2025-09-25-ismir-ykong‎_posts/2025-06-25-ismir-ykong renamed to _posts/2025-09-25-ismir-ykong b/‎_posts/2025-06-25-ismir-ykong‎ ‎_posts/2025-09-25-ismir-ykong‎_posts/2025-06-25-ismir-ykong renamed to _posts/2025-09-25-ismir-ykong
diff --git a/‎_posts/2025-10-12-waspaa-ykong‎
Lines changed: 23 additions & 0 deletions b/‎_posts/2025-10-12-waspaa-ykong‎
Lines changed: 23 additions & 0 deletions
@@ -0,0 +1,23 @@
+---
+layout: post
+title: "Double Entendre: Robust Audio-Based AI-Generated Lyrics Detection via Multi-View Fusion"
+date: 2025-06-27 18:00:00 +0200
+category: Publication
+author: eepure
+readtime: 1
+domains: 
+ - MIR
+people:
+ - eepure
+ - gmeseguerbrocal
+ - rhennequin
+publication_type: conference
+publication_title: "Double Entendre: Robust Audio-Based AI-Generated Lyrics Detection via Multi-View Fusion"
+publication_year: 2025
+publication_authors: Markus Frohmann, Elena V Epure, Gabriel Meseguer-Brocal, Markus Schedl, Romain Hennequin
+publication_conference: ACL
+publication_code: "https://github.com/deezer/robust-AI-lyrics-detection"
+publication_preprint: "https://arxiv.org/abs/2506.15981"
+---
+
+The rapid advancement of AI-based music generation tools is revolutionizing the music industry but also posing challenges to artists, copyright holders, and providers alike. This necessitates reliable methods for detecting such AI-generated content. However, existing detectors, relying on either audio or lyrics, face key practical limitations: audio-based detectors fail to generalize to new or unseen generators and are vulnerable to audio perturbations; lyrics-based methods require cleanly formatted and accurate lyrics, unavailable in practice. To overcome these limitations, we propose a novel, practically grounded approach: a multimodal, modular late-fusion pipeline that combines automatically transcribed sung lyrics and speech features capturing lyrics-related information within the audio. By relying on lyrical aspects directly from audio, our method enhances robustness, mitigates susceptibility to low-level artifacts, and enables practical applicability. Experiments show that our method, DE-detect, outperforms existing lyrics-based detectors while also being more robust to audio perturbations. Thus, it offers an effective, robust solution for detecting AI-generated music in real-world scenarios.
@@ -20,4 +20,4 @@ publication_code: "https://github.com/deezer/robust-AI-lyrics-detection"
 publication_preprint: "https://arxiv.org/abs/2506.18488"
 ---
 
-The recent rise in capabilities of AI-based music generation tools has created an upheaval in the music industry, necessitating the creation of accurate methods to detect such AI-generated content. This can be done using audio-based detectors; however, it has been shown that they struggle to generalize to unseen generators or when the audio is perturbed. Furthermore, recent work used accurate and cleanly formatted lyrics sourced from a lyrics provider database to detect AI-generated music. However, in practice, such perfect lyrics are not available (only the audio is); this leaves a substantial gap in applicability in real-life use cases. In this work, we instead propose solving this gap by transcribing songs using general automatic speech recognition (ASR) models. We do this using several detectors. The results on diverse, multi-genre, and multi-lingual lyrics show generally strong detection performance across languages and genres, particularly for our best-performing model using Whisper large-v2 and LLM2Vec embeddings. In addition, we show that our method is more robust than state-of-the-art audio-based ones when the audio is perturbed in different ways and when evaluated on different music generators. Our code is available at https://github.com/deezer/robust-AI-lyrics-detection.
+The recent rise in capabilities of AI-based music generation tools has created an upheaval in the music industry, necessitating the creation of accurate methods to detect such AI-generated content. This can be done using audio-based detectors; however, it has been shown that they struggle to generalize to unseen generators or when the audio is perturbed. Furthermore, recent work used accurate and cleanly formatted lyrics sourced from a lyrics provider database to detect AI-generated music. However, in practice, such perfect lyrics are not available (only the audio is); this leaves a substantial gap in applicability in real-life use cases. In this work, we instead propose solving this gap by transcribing songs using general automatic speech recognition (ASR) models. We do this using several detectors. The results on diverse, multi-genre, and multi-lingual lyrics show generally strong detection performance across languages and genres, particularly for our best-performing model using Whisper large-v2 and LLM2Vec embeddings. In addition, we show that our method is more robust than state-of-the-art audio-based ones when the audio is perturbed in different ways and when evaluated on different music generators.
@@ -0,0 +1,23 @@
+---
+layout: post
+title: "Multi-Class-Token Transformer for Multitask Self-supervised Music Information Retrieval"
+date: 2025-10-12 18:00:00 +0200
+category: Publication
+author: ykong
+readtime: 1
+domains: 
+ - MIR
+people:
+ - ykong
+ - rhennequin
+ - gmeseguerbrocal
+publication_type: conference
+publication_title: "Multi-Class-Token Transformer for Multitask Self-supervised Music Information Retrieval"
+publication_year: 2025
+publication_authors: Yuexuan Kong, Vincent Lostanlen, Romain Hennequin, Mathieu Lagrange, Gabriel Meseguer-Brocal
+publication_conference: WASPAA
+publication_code: "https://github.com/deezer/mt2"
+publication_preprint: "https://arxiv.org/abs/2507.12996"
+---
+
+Contrastive learning and equivariant learning are effective methods for self-supervised learning (SSL) for audio content analysis. Yet, their application to music information retrieval (MIR) faces a dilemma: the former is more effective on tagging (e.g., instrument recognition) but less effective on structured prediction (e.g., tonality estimation); The latter can match supervised methods on the specific task it is designed for, but it does not generalize well to other tasks. In this article, we adopt a best-of-both-worlds approach by training a deep neural network on both kinds of pretext tasks at once. The proposed new architecture is a Vision Transformer with 1-D spectrogram patches (ViT-1D), equipped with two class tokens, which are specialized to different self-supervised pretext tasks but optimized through the same model: hence the qualification of self-supervised multi-class-token multitask (MT2). The former class token optimizes cross-power spectral density (CPSD) for equivariant learning over the circle of fifths, while the latter optimizes normalized temperature-scaled cross-entropy (NT-Xent) for contrastive learning. MT2 combines the strengths of both pretext tasks and outperforms consistently both single-class-token ViT-1D models trained with either contrastive or equivariant learning. Averaging the two class tokens further improves performance on several tasks, highlighting the complementary nature of the representations learned by each class token. Furthermore, using the same single-linear-layer probing method on the features of last layer, MT2 outperforms MERT on all tasks except for beat tracking; achieving this with 18x fewer parameters thanks to its multitasking capabilities. Our SSL benchmark demonstrates the versatility of our multi-class-token multitask learning approach for MIR applications.