Skip to content

Commit 8944210

Browse files
committed
update
1 parent 6bada72 commit 8944210

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

content/en/docs/about/the-stack.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,11 @@ As part of the BigCode project, we released and will maintain [The Stack](https:
2222
| v1.2 | Opt-out request submitted by 09.02.2023 were excluded from this ersion of the dataset as well as initially flagged malicious files (not exhaustive). |
2323

2424
## Datasets and data governance tools released by BigCode
25-
- The Stack: Exact deduplicated version of The Stack.
26-
- The Stack dedup: Near deduplicated version of The Stack (recommended for training).
27-
- The Stack issues: Collection of GitHub issues.
28-
- The Stack Metadata: Metadata of the repositories in The Stack.
29-
- Am I in the Stack: Check if your data is in The Stack and request opt-out.
25+
- [The Stack](https://huggingface.co/datasets/bigcode/the-stack): Exact deduplicated version of The Stack.
26+
- [The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup): Near deduplicated version of The Stack (recommended for training).
27+
- [The Stack issues](https://huggingface.co/datasets/bigcode/the-stack-github-issues): Collection of GitHub issues.
28+
- [The Stack Metadata](https://huggingface.co/datasets/bigcode/the-stack-metadata): Metadata of the repositories in The Stack.
29+
- [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
3030

3131
One of our goals in this project is to give people agency over their source code by letting them decide whether or not it should be used to develop and evaluate LLMs, as we acknowledge that not all developers may wish to have their data used for that purpose.
3232

@@ -36,17 +36,17 @@ Our first step to that end was to select source code with permissive licenses, i
3636
We have a developed a tool to help users understand whether their data is in The Stack. Check out [Am I in The Stack?](https://huggingface.co/spaces/bigcode/in-the-stack).
3737

3838
### How can I request that my data be removed from The Stack?
39-
You can opt-out your repositories from The Stack dataset by creating an issue in our GitHub opt-out repository and listing the repositories you would like to exclude. We will then exclude those repositories in the next iteration of The Stack. To initiate this process, you should first check if any of your repositories are actually in The Stack using the Am I in the Stack app.
39+
You can opt-out your repositories from [The Stack dataset](https://huggingface.co/datasets/bigcode/the-stack) by creating an issue in our [GitHub opt-out repository](https://github.com/bigcode-project/opt-out-v2) and listing the repositories you would like to exclude. We will then exclude those repositories in the next iteration of The Stack. To initiate this process, you should first check if any of your repositories are actually in The Stack using the [Am I in the Stack app](https://huggingface.co/spaces/bigcode/in-the-stack).
4040

41-
If you decide that you wish to have repos owned by you removed from The Stack, please create an issue so that we can verify that you are in fact the owner of the repositories requested for opt-out.
41+
If you decide that you wish to have repos owned by you removed from The Stack, please [create an issue](https://github.com/bigcode-project/opt-out-v2/issues/new?assignees=&labels=&template=opt-out-request.md&title=Opt-out+request+for+USERNAME) so that we can verify that you are in fact the owner of the repositories requested for opt-out.
4242

4343
If you are experiencing difficulty with this process, please email [email protected].
4444

4545
### What data can I request be removed from The Stack?
46-
You can choose to request either (1) all repos, or (2) you can specify select repos that you own to be removed. You can also specify Commits and GitHub Issues to be removed as part of your opt-out request. More details about this process on GitHub.
46+
You can choose to request either (1) all repos, or (2) you can specify select repos that you own to be removed. You can also specify Commits and GitHub Issues to be removed as part of your opt-out request. More details about this process on [GitHub](https://github.com/bigcode-project/opt-out-v2).
4747

4848
### Can I also prevent my data from being included in future versions of The Stack?
49-
The removal request process will be used to validate removal requests and for processing of removal requests to remove opt-out data. Validated requests and associated code pointers will also be stored in order to ensure that the code does not appear in future versions of The Stack.
49+
The [removal request](https://github.com/bigcode-project/opt-out-v2) process will be used to validate removal requests and for processing of removal requests to remove opt-out data. Validated requests and associated code pointers will also be stored in order to ensure that the code does not appear in future versions of The Stack.
5050

5151
### What happens to my data once I’ve requested its removal?
5252
For as long as we are maintaining The Stack dataset, we will provide regular updates to the dataset to remove data that has been flagged since the last version. The current plan is to update the dataset every 3 months, although the schedule may change based on the volume of requests received. If we are not in a position to continue maintaining the dataset, we plan to stop distributing it in its current format and update its terms of use to limit its range of applications further, including for training new LLMs. Finally, we [require](https://huggingface.co/datasets/bigcode/the-stack#terms-of-use-for-the-stack) that people who download the dataset agree to use the most recent allowed version in order to incorporate the removal requests.

0 commit comments

Comments
 (0)