You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this repository you can find the code for building The Stack v2 dataset, as well as the extra sources used to make StarCoder2data: the training corpus of the StarCoder2 family of models.
3
+
In this repository you can find the code for building [The Stack v2](https://huggingface.co/datasets/bigcode/the-stack-v2) dataset, as well as the extra sources used to make StarCoder2data: the training corpus of the StarCoder2 family of models.
4
4
5
5
This reposirory is a follow-up of on the work in [bigcode-dataset](https://github.com/bigcode-project/bigcode-dataset/) used for [The Stack v1](https://huggingface.co/datasets/bigcode/the-stack) and [StarCoderData](https://huggingface.co/datasets/bigcode/starcoderdata).
0 commit comments