Skip to content

Commit a6e978e

Browse files
committed
Added details for training data
1 parent 4502767 commit a6e978e

File tree

1 file changed

+60
-3
lines changed

1 file changed

+60
-3
lines changed

_pages/mm-argfallacy2025.md

Lines changed: 60 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -49,25 +49,82 @@ For each sub-task, participants can leverage the debate context of a given input
4949
# Data
5050

5151

52-
We use **MM-USED-fallacy** and release a version of the dataset specifically designed for argumentative fallacy detection. This dataset includes 1,891 sentences from [Haddadan et al.'s (2019)](https://aclanthology.org/P19-1463.pdf) dataset on US presidential elections. Each sentence is labeled with one of six argumentative fallacy categories, as introduced by [Goffredo et al. (2022)](https://www.ijcai.org/proceedings/2022/575).
52+
We use **MM-USED-fallacy** and release a version of the dataset specifically designed for argumentative fallacy detection. This dataset includes 1,278 sentences from [Haddadan et al.'s (2019)](https://aclanthology.org/P19-1463.pdf) dataset on US presidential elections. Each sentence is labeled with one of six argumentative fallacy categories, as introduced by [Goffredo et al. (2022)](https://www.ijcai.org/proceedings/2022/575).
5353

5454
Inspired by observations from [Goffredo et al. (2022)](https://www.ijcai.org/proceedings/2022/575) on the benefits of leveraging multiple argument mining tasks for fallacy detection and classification, we also provide additional datasets to encourage multi-task learning. A summary is provided in the table below:
5555

5656
---
5757

5858
| **Dataset** | **Description** | **Size** |
5959
|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|
60+
|**MM-USED-fallacy** | A multimodal extension of USElecDeb60to20 dataset, covering US presidential debates (1960-2020). Inlcludes labels for argumentative fallacy detection and argumentative fallacy classification. | 1,278 samples (updated version)|
61+
| **MM-USED** | A multimodal extension of the USElecDeb60to16 dataset, covering US presidential debates (1960–2016). Includes labels for argumentative sentence detection and component classification. | 23,505 sentences (updated version)|
6062
| **UKDebates** | 386 sentences and audio samples from the 2015 UK Prime Ministerial elections. Sentences are labeled for argumentative sentence detection: containing or not containing a claim. | 386 sentences |
6163
| **M-Arg** | A multimodal dataset for argumentative relation classification from the 2020 US Presidential elections. Sentences are labeled as attacking, supporting, or unrelated to another sentence. | 4,104 pairs |
62-
| **MM-USED** | A multimodal extension of the USElecDeb60to16 dataset, covering US presidential debates (1960–2016). Includes labels for argumentative sentence detection and component classification. | 26,781 sentences |
64+
6365

6466
---
6567

6668
All datasets will be available through [MAMKit](https://nlp-unibo.github.io/mamkit/).
6769

6870
Since many multimodal datasets cannot release audio samples due to copyright restrictions, MAMKit provides an interface to dynamically build datasets and promote reproducible research.
6971

70-
Datasets are formatted as `torch.Dataset` objects, containing input values (text, audio, or both) and corresponding task-specific labels. More details about data formats and dataset building are available in MAMKit's documentation.
72+
Datasets are formatted as `torch.Dataset` objects, containing input values (text, audio, or both) and corresponding task-specific labels. More details about data formats and dataset building are available in MAMKit's documentation. ## Retrieving the Data through MAMKit
73+
74+
To retrieve the datasets through MAMKit, you can use the following code interface:
75+
76+
```python
77+
from mamkit.data.datasets import MMUSEDFallacy, USEDFallacy, UKDebates, MArg
78+
import logging
79+
from pathlib import Path
80+
81+
def loading_data_example():
82+
base_data_path = Path(__file__).parent.parent.resolve().joinpath('data')
83+
84+
# MM-USED-fallacy dataset
85+
mm_used_fallacy_loader = MMUSEDFallacy(
86+
task_name='afc', # Choose between 'afc' or 'afd'
87+
input_mode=InputMode.TEXT_AUDIO, # Choose between TEXT_ONLY, AUDIO_ONLY, or TEXT_AUDIO
88+
base_data_path=base_data_path
89+
)
90+
91+
# MM-USED dataset
92+
mm_used_loader = MMUSED(
93+
task_name='asd',#Choose between 'asd' or 'acc'
94+
input_mode=InputMode.TEXT_AUDIO, # Choose between TEXT_ONLY, AUDIO_ONLY, or TEXT_AUDIO
95+
base_data_path=base_data_path
96+
)
97+
98+
# UKDebates dataset
99+
uk_debates_loader = UKDebates(
100+
task_name='asd',
101+
input_mode=InputMode.TEXT_AUDIO, # Choose between TEXT_ONLY, AUDIO_ONLY, or TEXT_AUDIO
102+
base_data_path=base_data_path
103+
)
104+
105+
# M-Arg dataset
106+
m_arg_loader = MArg(
107+
task_name='arc',
108+
input_mode=InputMode.TEXT_AUDIO, # Choose between TEXT_ONLY, AUDIO_ONLY, or TEXT_AUDIO
109+
base_data_path=base_data_path
110+
)
111+
```
112+
113+
Each loader is initialized with the appropriate task name (`afc` for argumentative fallacy classification, `asd` for argumentative sentence detection, and 'arc' for argumentative relation classification), input mode (InputMode.TEXT_ONLY, InputMode.AUDIO_ONLY, or InputMode.TEXT_AUDIO), and the base data path.
114+
115+
Ensure that you have MAMKit installed and properly configured in your environment to use these loaders.
116+
117+
For more details, refer to the MAMKit [GitHub repository](https://nlp-unibo.github.io/mamkit/) and [website](https://nlp-unibo.github.io/mamkit/) .
118+
119+
120+
### References
121+
122+
- **MM-USED-fallacy**: [Mancini et al. (2024)](https://aclanthology.org/2024.eacl-short.16.pdf). The version provided through MAMKit includes updated samples, with refinements in the alignment process. This results in a different number of samples compared to the original dataset.
123+
- **MM-USED**: [Mancini et al. (2022)](https://aclanthology.org/2022.argmining-1.15.pdf). The version provided through MAMKit includes updated samples, with refinements in the alignment process. This results in a different number of samples compared to the original dataset.
124+
- **UK-Debates**: [Lippi and Torroni (2016)](https://ojs.aaai.org/index.php/AAAI/article/view/10384).
125+
- **M-Arg**: [Mestre et al. (2021)](https://aclanthology.org/2021.argmining-1.8.pdf).
126+
127+
**Note**: By "updated version," we mean that the datasets have undergone a refinement in the alignment process, which has resulted in adjustments to the number of samples included compared to the original versions published in the referenced papers.
71128

72129
# Evaluation
73130
For argumentative fallacy detection, we will compute the binary F1-score on predicted sentence-level labels.

0 commit comments

Comments
 (0)