Skip to content

Commit 4c9feb8

Browse files
committed
Add data: AMI DialSum Corpus
1 parent af79b5f commit 4c9feb8

File tree

7,832 files changed

+70424
-7
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

7,832 files changed

+70424
-7
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
.idea/*
22
*.pyc
33
.DS_Store
4-
data/ami_*/*
4+
data/ami-*/*
55
data/*/.DS_Store
66
venv*/*

README.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
* Transforms into CNN-DailyMail News dataset (`.story` files with article and highlight in it)
66

77
### Contents
8-
[Requirements](#requirements)[About AMI Meeting Corpus](#ami-corpus)[How to Use](#how-to-use)[How to Cite](#acknowledgement)
8+
[Requirements](#requirements)[About AMI Meeting Corpus](#ami-corpus)[AMI DialSum Corpus](#ami-dialsum-meeting-corpus)[How to Use](#how-to-use)[How to Cite](#acknowledgement)
99

1010
## Requirements
1111
Tested on Python 3.6+, Ubuntu 16.04, Mac OS
@@ -86,16 +86,17 @@ python main_obtain_meeting2summary_data.py --summary_type abstractive
8686
* Return all the collected words as a paragraph
8787
* Output: `data/ami-summary/extractive/`
8888

89+
## AMI DialSum Meeting Corpus
90+
* [DialSum](https://github.com/MiuLab/DialSum): modified version of the AMI Meeting Dataset
91+
* Use script `ami_dialsum_meeting_story.py`:
92+
* This script takes 2 text files (`in` and `sum`) and formats it into a series of `.story` files compatible with the CNN/DM format
93+
* Each line in file `in` corresponds to a meeting transcript with summary present in the same line in file `sum`/.
94+
8995
## Notes
9096
* XML reader in Python:
9197
* Minidom vs Element Tree: [Reading XML files in Python](http://stackabuse.com/reading-and-writing-xml-files-in-python/)
9298
* Minidom: XML parser for Python
9399

94-
* Script `ami_dialsum_meeting_story.py`:
95-
* This script takes 2 text files (`in` and `sum`) and formats it into a series of `.story` files compatible with the CNN/DM format
96-
* Each line in file `in` corresponds to a meeting transcript with summary present in the same line in file `sum`/.
97-
* Implemented to deal with a modified version of the AMI Meeting Dataset called [DialSum](https://github.com/MiuLab/DialSum).
98-
99100
* TODO
100101
* Overlapping meeting transcript
101102
* Decision abstract

data/ami_dialsum_corpus/test/in

Lines changed: 400 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)