Skip to content

Commit a58eecd

Browse files
committed
Reblacking
Signed-off-by: Davis Wertheimer <[email protected]>
1 parent 15f4d7e commit a58eecd

File tree

1 file changed

+1
-3
lines changed

1 file changed

+1
-3
lines changed

fms_fsdp/config/training.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,7 @@ class train_config:
1515
file_type: str = "arrow"
1616
col_name: str = "tokens"
1717
tokenizer_path: str = "/fsx/tokenizer"
18-
datasets: str = (
19-
"lang=en/dataset=commoncrawl,lang=en/dataset=webhose,lang=en/dataset=github_clean,lang=de/dataset=wikipedia,lang=es/dataset=wikipedia,lang=fr/dataset=wikipedia,lang=ja/dataset=wikipedia,lang=pt/dataset=wikipedia,lang=en/dataset=wikimedia,lang=en/dataset=uspto,lang=en/dataset=pubmedcentral,lang=en/dataset=arxiv,lang=en/dataset=stackexchange"
20-
)
18+
datasets: str = "lang=en/dataset=commoncrawl,lang=en/dataset=webhose,lang=en/dataset=github_clean,lang=de/dataset=wikipedia,lang=es/dataset=wikipedia,lang=fr/dataset=wikipedia,lang=ja/dataset=wikipedia,lang=pt/dataset=wikipedia,lang=en/dataset=wikimedia,lang=en/dataset=uspto,lang=en/dataset=pubmedcentral,lang=en/dataset=arxiv,lang=en/dataset=stackexchange"
2119
weights: str = "7725,500,550,28,17,22,25,8,100,500,175,250,100"
2220
seq_length: int = 4096
2321
vocab_size: int = 32000

0 commit comments

Comments
 (0)