Skip to content

Commit 92cb98a

Browse files
authored
Merge pull request #56 from gasmichel/master
Update publications/picture
2 parents 8a3960c + 486ca76 commit 92cb98a

File tree

5 files changed

+93
-0
lines changed

5 files changed

+93
-0
lines changed
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
layout: post
3+
title: "Distinguishing Fictional Voices: a Study of Authorship Verification Models for Quotation Attribution"
4+
date: 2024-03-16 10:00:00 +0200
5+
category: Publication
6+
author: gmichel
7+
readtime: 1
8+
domains:
9+
- NLP
10+
people:
11+
- gmichel
12+
- eepure
13+
- rhennequin
14+
publication_type: conference
15+
publication_title: "Distinguishing Fictional Voices: a Study of Authorship Verification Models for Quotation Attribution"
16+
publication_year: 2024
17+
publication_authors: Gaspard Michel, Elena V. Epure, Romain Hennequin, Christophe Cerisara
18+
publication_conference: LaTeCH-CLfL (EACL)
19+
publication_code: "https://github.com/deezer/quote_AV"
20+
publication_preprint: "https://aclanthology.org/2024.latechclfl-1.15.pdf"
21+
---
22+
23+
Recent approaches to automatically detect the speaker of an utterance of direct speech often disregard general information about characters in favor of local information found in the context, such as surrounding mentions of entities. In this work, we explore stylistic representations of characters built by encoding their quotes with off-the-shelf pretrained Authorship Verification models in a large corpus of English novels (the Project Dialogism Novel Corpus). Results suggest that the combination of stylistic and topical information captured in some of these models accurately distinguish characters among each other, but does not necessarily improve over semantic-only models when attributing quotes. However, these results vary across novels and more investigation of stylometric models particularly tailored for literary texts and the study of characters should be conducted.Than

_posts/2024-11-12-EMNLP-gmichel.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
layout: post
3+
title: "Improving Quotation Attribution with Fictional Character Embeddings"
4+
date: 2024-11-12 10:00:00 +0200
5+
category: Publication
6+
author: gmichel
7+
readtime: 1
8+
domains:
9+
- NLP
10+
people:
11+
- gmichel
12+
- eepure
13+
- rhennequin
14+
publication_type: conference
15+
publication_title: "Improving Quotation Attribution with Fictional Character Embeddings"
16+
publication_year: 2024
17+
publication_authors: Gaspard Michel, Elena V. Epure, Romain Hennequin, Christophe Cerisara
18+
publication_conference: EMNLP
19+
publication_code: "https://github.com/deezer/character_embeddings_qa"
20+
publication_preprint: "https://aclanthology.org/2024.findings-emnlp.744.pdf"
21+
---
22+
23+
Humans naturally attribute utterances of direct speech to their speaker in literary works.When attributing quotes, we process contextual information but also access mental representations of characters that we build and revise throughout the narrative. Recent methods to automatically attribute such utterances have explored simulating human logic with deterministic rules or learning new implicit rules with neural networks when processing contextual information.However, these systems inherently lack character representations, which often leads to errors in more challenging examples of attribution: anaphoric and implicit quotes.In this work, we propose to augment a popular quotation attribution system, BookNLP, with character embeddings that encode global stylistic information of characters derived from an off-the-shelf stylometric model, Universal Authorship Representation (UAR).We create DramaCV, a corpus of English drama plays from the 15th to 20th century that we automatically annotate for Authorship Verification of fictional characters utterances, and release two versions of UAR trained on DramaCV, that are tailored for literary characters analysis.Then, through an extensive evaluation on 28 novels, we show that combining BookNLP’s contextual information with our proposed global character embeddings improves the identification of speakers for anaphoric and implicit quotes, reaching state-of-the-art performance.
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
layout: post
3+
title: "Harnessing High-Level Song Descriptors towards Natural Language-Based Music Recommendation"
4+
date: 2024-11-15 10:00:00 +0200
5+
category: Publication
6+
author: eepure
7+
readtime: 1
8+
domains:
9+
- NLP
10+
people:
11+
- eepure
12+
- gmeseguerbrocal
13+
- dafchar
14+
- rhennequin
15+
publication_type: conference
16+
publication_title: "Harnessing High-Level Song Descriptors towards Natural Language-Based Music Recommendation"
17+
publication_year: 2024
18+
publication_authors: Elena V. Epure, Gabriel Meseguer Brocal, Darius Afchar, Romain Hennequin
19+
publication_conference: NLP4MusA (ISMIR)
20+
publication_code: "https://github.com/deezer/nlp4musa_melscribe"
21+
publication_preprint: "https://aclanthology.org/2024.nlp4musa-1.4.pdf"
22+
---
23+
24+
Recommender systems relying on Language Models (LMs) have gained popularity in assisting users to navigate large catalogs. LMs often exploit item high-level descriptors, i.e. categories or consumption contexts, from training data or user preferences. This has been proven effective in domains like movies or products. In music though, understanding how effectively LMs utilize song descriptors for natural language-based music recommendation is relatively limited. In this paper, we assess LMs effectiveness in recommending songs based on user natural language requests and items with descriptors like genres, moods, and listening contexts. We formulate the recommendation as a dense retrieval problem and assess LMs as they become increasingly familiar with data pertinent to the task and domain. Our findings reveal improved performance as LMs are fine-tuned for general language similarity, information retrieval, and mapping longer descriptions to shorter, high-level descriptors in music.

_posts/2025-04-29-NAACL-gmichel.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
layout: post
3+
title: "Evaluating LLMs for Quotation Attribution in Literary Texts: A Case Study of LLaMa3"
4+
date: 2025-04-29 10:00:00 +0200
5+
category: Publication
6+
author: gmichel
7+
readtime: 1
8+
domains:
9+
- NLP
10+
people:
11+
- gmichel
12+
- eepure
13+
- rhennequin
14+
publication_type: conference
15+
publication_title: "Evaluating LLMs for Quotation Attribution in Literary Texts: A Case Study of LLaMa3"
16+
publication_year: 2025
17+
publication_authors: Gaspard Michel, Elena V. Epure, Romain Hennequin, Christophe Cerisara
18+
publication_conference: NAACL
19+
publication_code: "https://github.com/deezer/llms_quotation_attribution"
20+
publication_preprint: "https://arxiv.org/pdf/2406.11380"
21+
---
22+
23+
Large Language Models (LLMs) have shown promising results in a variety of literary tasks, often using complex memorized details of narration and fictional characters. In this work, we evaluate the ability of Llama-3 at attributing utterances of direct-speech to their speaker in novels. The LLM shows impressive results on a corpus of 28 novels, surpassing published results with ChatGPT and encoder-based baselines by a large margin. We then validate these results by assessing the impact of book memorization and annotation contamination. We found that these types of memorization do not explain the large performance gain, making Llama-3 the new state-of-the-art for quotation attribution in English literature.

static/images/photos/gaspard.jpg

-16.7 KB
Loading

0 commit comments

Comments
 (0)