You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/en/docs/about/the-stack.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,11 +22,11 @@ As part of the BigCode project, we released and will maintain [The Stack](https:
22
22
| v1.2 | Opt-out request submitted by 09.02.2023 were excluded from this ersion of the dataset as well as initially flagged malicious files (not exhaustive). |
23
23
24
24
## Datasets and data governance tools released by BigCode
25
-
- The Stack: Exact deduplicated version of The Stack.
26
-
- The Stack dedup: Near deduplicated version of The Stack (recommended for training).
27
-
- The Stack issues: Collection of GitHub issues.
28
-
- The Stack Metadata: Metadata of the repositories in The Stack.
29
-
- Am I in the Stack: Check if your data is in The Stack and request opt-out.
25
+
-[The Stack](https://huggingface.co/datasets/bigcode/the-stack): Exact deduplicated version of The Stack.
26
+
-[The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup): Near deduplicated version of The Stack (recommended for training).
27
+
-[The Stack issues](https://huggingface.co/datasets/bigcode/the-stack-github-issues): Collection of GitHub issues.
28
+
-[The Stack Metadata](https://huggingface.co/datasets/bigcode/the-stack-metadata): Metadata of the repositories in The Stack.
29
+
-[Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
30
30
31
31
One of our goals in this project is to give people agency over their source code by letting them decide whether or not it should be used to develop and evaluate LLMs, as we acknowledge that not all developers may wish to have their data used for that purpose.
32
32
@@ -36,17 +36,17 @@ Our first step to that end was to select source code with permissive licenses, i
36
36
We have a developed a tool to help users understand whether their data is in The Stack. Check out [Am I in The Stack?](https://huggingface.co/spaces/bigcode/in-the-stack).
37
37
38
38
### How can I request that my data be removed from The Stack?
39
-
You can opt-out your repositories from The Stack dataset by creating an issue in our GitHub opt-out repository and listing the repositories you would like to exclude. We will then exclude those repositories in the next iteration of The Stack. To initiate this process, you should first check if any of your repositories are actually in The Stack using the Am I in the Stack app.
39
+
You can opt-out your repositories from [The Stack dataset](https://huggingface.co/datasets/bigcode/the-stack) by creating an issue in our [GitHub opt-out repository](https://github.com/bigcode-project/opt-out-v2) and listing the repositories you would like to exclude. We will then exclude those repositories in the next iteration of The Stack. To initiate this process, you should first check if any of your repositories are actually in The Stack using the [Am I in the Stack app](https://huggingface.co/spaces/bigcode/in-the-stack).
40
40
41
-
If you decide that you wish to have repos owned by you removed from The Stack, please create an issue so that we can verify that you are in fact the owner of the repositories requested for opt-out.
41
+
If you decide that you wish to have repos owned by you removed from The Stack, please [create an issue](https://github.com/bigcode-project/opt-out-v2/issues/new?assignees=&labels=&template=opt-out-request.md&title=Opt-out+request+for+USERNAME) so that we can verify that you are in fact the owner of the repositories requested for opt-out.
42
42
43
43
If you are experiencing difficulty with this process, please email [email protected].
44
44
45
45
### What data can I request be removed from The Stack?
46
-
You can choose to request either (1) all repos, or (2) you can specify select repos that you own to be removed. You can also specify Commits and GitHub Issues to be removed as part of your opt-out request. More details about this process on GitHub.
46
+
You can choose to request either (1) all repos, or (2) you can specify select repos that you own to be removed. You can also specify Commits and GitHub Issues to be removed as part of your opt-out request. More details about this process on [GitHub](https://github.com/bigcode-project/opt-out-v2).
47
47
48
48
### Can I also prevent my data from being included in future versions of The Stack?
49
-
The removal request process will be used to validate removal requests and for processing of removal requests to remove opt-out data. Validated requests and associated code pointers will also be stored in order to ensure that the code does not appear in future versions of The Stack.
49
+
The [removal request](https://github.com/bigcode-project/opt-out-v2) process will be used to validate removal requests and for processing of removal requests to remove opt-out data. Validated requests and associated code pointers will also be stored in order to ensure that the code does not appear in future versions of The Stack.
50
50
51
51
### What happens to my data once I’ve requested its removal?
52
52
For as long as we are maintaining The Stack dataset, we will provide regular updates to the dataset to remove data that has been flagged since the last version. The current plan is to update the dataset every 3 months, although the schedule may change based on the volume of requests received. If we are not in a position to continue maintaining the dataset, we plan to stop distributing it in its current format and update its terms of use to limit its range of applications further, including for training new LLMs. Finally, we [require](https://huggingface.co/datasets/bigcode/the-stack#terms-of-use-for-the-stack) that people who download the dataset agree to use the most recent allowed version in order to incorporate the removal requests.
0 commit comments