Skip to content

Commit 6585fd1

Browse files
committed
up
1 parent 8944210 commit 6585fd1

File tree

2 files changed

+75
-0
lines changed

2 files changed

+75
-0
lines changed

content/en/docs/about/models.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
---
2+
title: "Models"
3+
description: ""
4+
lead: ""
5+
date: 2020-11-16T13:59:39+01:00
6+
lastmod: 2020-11-16T13:59:39+01:00
7+
draft: false
8+
images: []
9+
menu:
10+
docs:
11+
parent: "about"
12+
weight: 220
13+
toc: true
14+
---
15+
### StarCoder
16+
- [Paper](https://arxiv.org/abs/2305.06161): A technical report about StarCoder.
17+
- [GitHub](https://github.com/bigcode-project/starcoder/tree/main): All you need to know about using or fine-tuning StarCoder.
18+
- [StarCoder](https://huggingface.co/bigcode/starcoder): StarCoderBase further trained on Python.
19+
- [StarCoderBase](https://huggingface.co/bigcode/starcoderbase): Trained on 80+ languages from The Stack.
20+
- [StarCoder+](https://huggingface.co/bigcode/starcoderplus): StarCoderBase further trained on English web data.
21+
- [StarEncoder](https://huggingface.co/bigcode/starencoder): Encoder model trained on TheStack.
22+
- [StarPii](https://huggingface.co/bigcode/starpii): StarEncoder based PII detector.
23+
24+
### StarCoder Tools & Demos
25+
- [StarCoder Playground](https://huggingface.co/spaces/bigcode/bigcode-playground): Write with StarCoder Models!
26+
- [VSCode Extension](https://marketplace.visualstudio.com/items?itemName=HuggingFace.huggingface-vscode): Code with StarCoder!
27+
- [StarChat](https://huggingface.co/spaces/HuggingFaceH4/starchat-playground): Chat with StarCoder!
28+
- [Tech Assistant Prompt](https://huggingface.co/datasets/bigcode/ta-prompt): With this prompt you can turn StarCoder into tech assistant.
29+
- [StarCoder Editor](https://huggingface.co/spaces/bigcode/bigcode-editor): Edit with StarCoder!
30+
31+
### StarCoder Data & Governance
32+
33+
- [Governance Card](https://huggingface.co/datasets/bigcode/governance-card): A card outlining the governance of the model.
34+
- [StarCoder License Agreement](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement): The model is licensed under the BigCode OpenRAIL-M v1 license agreement.
35+
- [StarCoder Data](https://huggingface.co/datasets/bigcode/starcoderdata): Pretraining dataset of StarCoder.
36+
- [StarCoder Search](https://huggingface.co/spaces/bigcode/search): Full-text search code in the pretraining dataset.
37+
- [StarCoder Membership Test](https://stack.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
38+
39+
### SantaCoder 
40+
41+
SantaCoder aka smol StarCoder: same architecture but only trained on Python, Java, JavaScript. 
42+
43+
- [SantaCoder](https://huggingface.co/bigcode/santacoder): SantaCoder Model.
44+
- [SantaCoder Demo](https://huggingface.co/spaces/bigcode/santacoder-demo): Write with SantaCoder.
45+
- [SantaCoder Search](https://huggingface.co/spaces/bigcode/santacoder-search): Search code in the pretraining dataset.
46+
- [SantaCoder License](https://huggingface.co/spaces/bigcode/license): The OpenRAIL license for SantaCoder.

content/en/docs/about/timeline.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
title: "Timeline, milestones, and community events"
3+
description: ""
4+
lead: ""
5+
date: 2020-11-16T13:59:39+01:00
6+
lastmod: 2020-11-16T13:59:39+01:00
7+
draft: false
8+
images: []
9+
menu:
10+
docs:
11+
parent: "about"
12+
weight: 230
13+
toc: true
14+
---
15+
* [September 26, 2022](https://twitter.com/BigCodeProject/status/1574427555871875072?s=20): Announcement of the BigCode project. 
16+
* [October 6, 2022](https://youtu.be/8cUpsXIEbAo): Webinar with the BigCode Community to provide strategic direction. 
17+
* [October 27, 2022](https://twitter.com/BigCodeProject/status/1585631176353796097?s=20): Introduction of "The Stack" dataset and paper publication. 
18+
* [November 15, 2022](https://twitter.com/BigCodeProject/status/1592569651086905344?s=20): Introduction of "Am I in The Stack" tool and BigCode Opt-Out process. 
19+
* [November 23, 2022](https://twitter.com/LoubnaBenAllal1/status/1595457541592346634?s=20): Details shared on the approach to de-identification of personally identifiable information (PII). 
20+
* [November 29, 2022](https://twitter.com/BigCodeProject/status/1597589730425974786?s=20): Sharing of Weights and Biases dashboards for the first models. 
21+
* [December 1, 2022](https://twitter.com/BigCodeProject/status/1598345535190179843?s=20): Release of The Stack v1.1 with expanded data and programming languages. 
22+
* [December 2, 2022](https://twitter.com/BigCodeProject/status/1598734387247550481?s=20): In-person meetup with the BigCode community alongside NeurIPS 2022. 
23+
* [December 9, 2022](https://twitter.com/BigCodeProject/status/1601133018714112000?s=20): Meetup at EMNLP 2022 to raise awareness and engage with the NLP research community. 
24+
* [December 12, 2022](https://twitter.com/BigCodeProject/status/1602372753008386049?s=20): Communication to raise awareness of "Am I in The Stack" and opt-out option. 
25+
* [December 14, 2022](https://youtu.be/Kh8yXfJJfU4): Second webinar with the BigCode Community to review progress. 
26+
* [December 22, 2022](https://twitter.com/BigCodeProject/status/1605958778330849281?s=20): Release of SantaCoder, a 1.1B multilingual language model for code. 
27+
* [March 20, 2023](https://twitter.com/BigCodeProject/status/1637874705645584384?s=20): Announcement of The Stack v1.2, including additional datasets and simplified opt-out process. 
28+
* [April 13, 2023](https://twitter.com/harmdevries77/status/1646524056538316805?s=20): Analysis of Chinchilla scaling laws for training smaller language models. 
29+
* [May 4, 2023](https://twitter.com/BigCodeProject/status/1654174941976068119?s=20): Announcement of StarCoder and StarCoderBase, code language models trained on GitHub data. 

0 commit comments

Comments
 (0)