Skip to content

Format problem in the Google Drive Training Data #1

@PosoSAgapo

Description

@PosoSAgapo

The paper's work on the temporal commensense is great, however, I have some problems with regard to the formatted tranning data that you provide in the Google Drive, here is the example:

[CLS] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [SEP] [MASK] [MASK] liquid eye ##liner and leopard - print under [MASK] ##ments , her [MASK] [MASK] steel ##y [MASK] [MASK] thin [MASK] like [MASK] [MASK] [MASK] result of [MASK] 20 ##- [MASK] ##und [MASK] [MASK] [MASK] she [unused500] curls [MASK] [SEP] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] [SEP] [unused500] [unused502] [unused45] [SEP] -1 43 44 45 46 47 48 49 50 51 120 121 122 7.670003601399247e-05 0.010977975780067789 0.17749198395637333 0.3423587734906385 0.26762063340149095 0.1613272650199883 0.03558053856215351 0.004304815288057253 0.00026131446521643803 0.0 0.0 0.0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 4218 2014 6381 -1 -1 -1 -1 -1 -1 -1 6843 8163 1010 -1 9128 2003 -1 -1 1010 2014 -1 2608 -1 14607 1010 1996 -1 -1 1996 -1 -1 6873 -1 3347 17327 2015 -1 -1 -1 1012 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

1.I guess you masked the event in the sentence that I quoted here, but does the TacoLM could really predict so many masked words correctly? I don't think any human being could guess the [MASK] words in this sentence with so little information given.

2.The 'unuse' token that you described in your paper is used to construct 1-to-1 mapping to the new dictionary. But how could I know what the 'unuse' token really means?

  1. Why there is a always a number attached to the end of the sentence, like the '-1' attached to the '[SEP]' token ? In other exapmles. the number could be 79,43 and so on, what does this number actually mean?

4.After the '-1', there are still several numbers following, which based on the space between them, I don't think they are in the same group with the '-1' that I mentioned in Q3, what does these numbers mean?

5.What does the float number mean after these numbers?

6.The final numbers -1 -1 -1 ..... , I guess these are attention tokens? But it does not correspond to the HuggingFace's attention encoding, which is 1 for attention and 0 for no attention.

7.How does the whole tranning data form the (event, value, dimension) tuple in this case?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions