Skip to content

Commit 666202a

Browse files
committed
add week 4 labs and data - undownloable data.json file
1 parent 7a15f6d commit 666202a

25 files changed

+1145351
-0
lines changed

4 - Natural Language Processing with Attention Models/Labs/Week 4/C4_W4_Assignment.ipynb

Lines changed: 2066 additions & 0 deletions
Large diffs are not rendered by default.

4 - Natural Language Processing with Attention Models/Labs/Week 4/C4_W4_Assignment_original.ipynb

Lines changed: 1888 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
#####################################################
2+
#####################################################
3+
# Copyright Cambridge Dialogue Systems Group, 2018 #
4+
#####################################################
5+
#####################################################
6+
7+
Dataset contains the following files:
8+
1. data.json: the woz dialogue dataset, which contains the conversation users and wizards, as well as a set of coarse labels for each user turn. This file contains both system and user dialogue acts annotated at the turn level. Files with multi-domain dialogues have "MUL" in their names. Single domain dialogues have either "SNG" or "WOZ" in their names.
9+
2. restaurant_db.json: the Cambridge restaurant database file, containing restaurants in the Cambridge UK area and a set of attributes.
10+
3. attraction_db.json: the Cambridge attraction database file, contining attractions in the Cambridge UK area and a set of attributes.
11+
4. hotel_db.json: the Cambridge hotel database file, containing hotels in the Cambridge UK area and a set of attributes.
12+
5. train_db.json: the Cambridge train (with artificial connections) database file, containing trains in the Cambridge UK area and a set of attributes.
13+
6. hospital_db.json: the Cambridge hospital database file, contatining information about departments.
14+
7. police_db.json: the Cambridge police station information.
15+
8. taxi_db.json: slot-value list for taxi domain.
16+
9. valListFile.txt: list of dialogues for validation.
17+
10. testListFile.txt: list of dialogues for testing.
18+
11. system_acts.json:
19+
There are 6 domains ('Booking', 'Restaurant', 'Hotel', 'Attraction', 'Taxi', 'Train') and 1 dummy domain ('general').
20+
A domain-dependent dialogue act is defined as a domain token followed by a domain-independent dialogue act, e.g. 'Hotel-inform' means it is an 'inform' act in the Hotel domain.
21+
Dialogue acts which cannot take slots, e.g., 'good bye', are defined under the 'general' domain.
22+
A slot-value pair defined as a list with two elements. The first element is slot token and the second one is its value.
23+
If a dialogue act takes no slots, e.g., dialogue act 'offer booking' for an utterance 'would you like to take a reservation?', its slot-value pair is ['none', 'none']
24+
There are four types of values:
25+
1) If a slot takes a binary value, e.g., 'has Internet' or 'has park', the value is either 'yes' or 'no'.
26+
2) If a slot is under the act 'request', e.g., 'request' about 'area', the value is expressed as '?'.
27+
3) The value that appears in the utterance e.g., the name of a restaurant.
28+
4) If for some reason the turn does not have an annotation then it is labeled as "No Annotation."
29+
12. ontology.json: Data-based ontology containing all the values for the different slots in the domains.
30+
13. slot_descriptions.json: A collection of human-written slot descriptions for each slot in the dataset. Each slot has at least two descriptions.
31+
14. tokenization.md: A description of the tokenization preprocessing we had to perform to maintain consistency between the dialogue act annotations of DSTC 8 Track 1 and the existing MultiWOZ 2.0 data.

0 commit comments

Comments
 (0)