Incorrect handling of tokens lengths

In Long term forecast experiments, you fixed the max lenght of text to 1024 , see [https://github.com/AdityaLab/MM-TSFlib/blob/e789ce78c9bafd8e3ba0d8850f9ad2becbe83548/exp/exp_long_term_forecasting.py#L649](url)  (i guess in accordance with GPT2 context, right) but this implies an arbitrary truncation of the input text.

- I calculated the following statitics per dataset:

| Dataset       | Average number of Text Tokens per TimeStamp |
|----------------|---------------------------------------------|
| Agriculture    | 425                                         |
| Climate        | 560                                         |
| Economy        | 416                                         |
| Energy         | 57                                          |
| Environment    | 546                                         |
| PublicHealth  | 65                                          |
| Security       | 656                                         |
| SocialGood     | 350                                         |
| Traffic        | 64                                          |


Regarding, those stats, in a setting with input window lenght of 3 or more and  using any of the datasets Agriculture, Climate, Economy, Environment, Security or SocialGood,  it is clear that the total number of text tokens within the input window will overflow the context  of 1024 fixed in hard in the code. (e.g. 425*3 >1024 )
A truncation methodology should be carefully defined to correctly incompass fairly the text information within the input windows.

My solution is to simply use a LLM with a larger context, I did it in my own reimplementation.
It would be fine if you could adapt your lib code to handle this correctly. 

Thanks


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorrect handling of tokens lengths #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dataset	Average number of Text Tokens per TimeStamp
Agriculture	425
Climate	560
Economy	416
Energy	57
Environment	546
PublicHealth	65
Security	656
SocialGood	350
Traffic	64

Incorrect handling of tokens lengths #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions