Format problem in the Google Drive Training Data

The paper's work on the temporal commensense is great, however, I have some problems with regard to the formatted tranning data that you provide in the Google Drive, here is the example:

> [CLS] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [SEP] [MASK] [MASK] liquid eye ##liner and leopard - print under [MASK] ##ments , her [MASK] [MASK] steel ##y [MASK] [MASK] thin [MASK] like [MASK] [MASK] [MASK] result of [MASK] 20 ##- [MASK] ##und [MASK] [MASK] [MASK] she [unused500] curls [MASK] [SEP] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [SEP] [unused500] [unused502] [unused45] [SEP]	-1	43 44 45 46 47 48 49 50 51 120 121 122	7.670003601399247e-05 0.010977975780067789 0.17749198395637333 0.3423587734906385 0.26762063340149095 0.1613272650199883 0.03558053856215351 0.004304815288057253 0.00026131446521643803 0.0 0.0 0.0	-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 4218 2014 6381 -1 -1 -1 -1 -1 -1 -1 6843 8163 1010 -1 9128 2003 -1 -1 1010 2014 -1 2608 -1 14607 1010 1996 -1 -1 1996 -1 -1 6873 -1 3347 17327 2015 -1 -1 -1 1012 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

1.I guess you masked the event in the sentence that I quoted here, but does the TacoLM could really predict so many masked words correctly? I don't think any human being could guess the [MASK] words in this sentence with so little information given.
 
2.The 'unuse' token that you described in your paper is used to construct 1-to-1 mapping to the new dictionary. But how could I know what the  'unuse'  token really means?

3. Why there is a always a number attached to the end of the sentence, like the '-1' attached to the '[SEP]' token ? In other exapmles. the number could be 79,43 and so on, what does this number actually mean?

4.After the '-1', there are still several numbers following, which based on the space between them, I don't think they are in the same group with the '-1' that I mentioned in Q3, what does these numbers mean? 

5.What does the float number mean after these numbers? 

6.The final numbers -1 -1 -1 ..... ， I guess these are attention tokens? But it does not correspond to the HuggingFace's attention encoding, which is 1 for attention and 0 for no attention.

7.How does the whole tranning data form the (event, value, dimension)  tuple in this case?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Format problem in the Google Drive Training Data #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Format problem in the Google Drive Training Data #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions