Skip to content

Commit 2552674

Browse files
committed
Add both R1 and v3
1 parent c1b3e90 commit 2552674

File tree

2 files changed

+110
-10
lines changed

2 files changed

+110
-10
lines changed

deepseek-R1.yaml

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
---
2+
# Thank you for contributing!
3+
# In filling out this yaml file, please follow the criteria as described here:
4+
# https://github.com/opening-up-chatgpt/opening-up-chatgpt.github.io/tree/main/projects#criteria
5+
6+
# You're free to build on this work and reuse the data. It is licensed under CC-BY 4.0, with the
7+
# stipulation that attribution should come in the form of a link to http://opening-up-chatgpt.github.io
8+
# and a citation to the paper in which the initial dataset & criteria were published:
9+
10+
# Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In CUI '23: Proceedings of the 5th International Conference on Conversational User Interfaces. July 19-21, Eindhoven. doi: 10.1145/3571884.3604316
11+
12+
system:
13+
name: DeepSeek R1
14+
link: https://github.com/deepseek-ai/DeepSeek-R1
15+
type: text
16+
performanceclass:
17+
basemodelname: DeepSeek-V3-Base
18+
endmodelname: DeepSeek-R1
19+
endmodellicense: DeepSeek License Agreement
20+
releasedate: 2025-01
21+
notes:
22+
23+
org:
24+
name: DeepSeek
25+
link: https://www.deepseek.com/
26+
notes: "DeepSeek (深度求索), founded in 2023, is a Chinese company dedicated to making AGI a reality."
27+
28+
# availability:
29+
datasources_basemodel:
30+
class: closed
31+
link:
32+
notes: Pretraining data not specified or documented besides the claim that it amounts to 14.8T tokens in Chinese and English. Proprietary dataset.
33+
34+
datasources_endmodel:
35+
class: closed
36+
link:
37+
notes: Post-training data not specified or documented. Said to include '1.5M instances'.
38+
39+
weights_basemodel:
40+
class: open
41+
link: https://huggingface.co/deepseek-ai/DeepSeek-V3-Base
42+
notes:
43+
44+
weights_endmodel:
45+
class: open
46+
link: https://huggingface.co/deepseek-ai/DeepSeek-R1
47+
notes:
48+
49+
trainingcode:
50+
class: closed
51+
link: https://github.com/deepseek-ai/DeepSeek-R1
52+
notes: Repository does not contain code.
53+
54+
# documentation:
55+
code:
56+
class: closed
57+
link: https://github.com/deepseek-ai/DeepSeek-V3/
58+
notes: No code, so no documentation.
59+
60+
architecture:
61+
class: partial
62+
link: https://arxiv.org/pdf/2412.19437
63+
notes: Model architecture described in a fair bit of detail in base model technical report.
64+
65+
preprint:
66+
class: open
67+
link: https://arxiv.org/pdf/2501.12948
68+
notes: Paper provides information about techniques used for constructing model, but nothing on pre- or post-training data.
69+
70+
paper:
71+
class: closed
72+
link:
73+
notes: No peer-reviewed paper found.
74+
75+
modelcard:
76+
class: partial
77+
link: https://huggingface.co/deepseek-ai/DeepSeek-V3-Base
78+
notes: Model card on HF provides basic summary but no info on intended downstream uses, scope, biases, risks and limitations.
79+
80+
datasheet:
81+
class: closed
82+
link:
83+
notes: No datasheet provided.
84+
85+
# access:
86+
package:
87+
class: partial
88+
link:
89+
notes: No specific package provided by the authors but integrates well with many widely used packages.
90+
91+
api:
92+
class: open
93+
link: platform.deepseek.com
94+
notes: Available through various APIs.
95+
metaprompt: closed
96+
97+
licenses:
98+
class: partial
99+
link: https://huggingface.co/deepseek-ai/DeepSeek-V3-Base/blob/main/LICENSE-MODEL
100+
notes: MIT License and a separate Model License. Derived model (R1) claims to be fully licensed through MIT.

deepseek.yaml renamed to deepseek-v3.yaml

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ system:
1717
basemodelname: DeepSeek-V3-Base
1818
endmodelname: DeepSeek-V3
1919
endmodellicense: DeepSeek License Agreement
20-
releasedate: 2025-01
20+
releasedate: 2024-12
2121
notes:
2222

2323
org:
@@ -44,12 +44,12 @@ weights_basemodel:
4444
weights_endmodel:
4545
class: open
4646
link: https://huggingface.co/deepseek-ai/DeepSeek-V3
47-
notes: Also available in smaller model sizes
47+
notes:
4848

4949
trainingcode:
5050
class: partial
5151
link: https://github.com/deepseek-ai/DeepSeek-V3/tree/main/inference
52-
notes: Repository provides sparse source code and some examples for inference
52+
notes: Repository provides sparse source code and some examples for inference.
5353

5454
# documentation:
5555
code:
@@ -59,12 +59,12 @@ code:
5959

6060
architecture:
6161
class: partial
62-
link: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf
63-
notes: Model architecture described in a fair bit of detail in technical report
62+
link: https://arxiv.org/pdf/2412.19437
63+
notes: Model architecture described in a fair bit of detail in technical report.
6464

6565
preprint:
6666
class: open
67-
link: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf
67+
link: https://arxiv.org/pdf/2412.19437
6868
notes: Corporate technical report provides info on infrastructure compute, performance, but nothing on pre- or post-training data.
6969

7070
paper:
@@ -75,7 +75,7 @@ paper:
7575
modelcard:
7676
class: partial
7777
link: https://huggingface.co/deepseek-ai/DeepSeek-V3-Base
78-
notes: Model card on HF provides basic summary but no info on intended downstream uses, scope, biases, risks and limitations
78+
notes: Model card on HF provides basic summary but no info on intended downstream uses, scope, biases, risks and limitations.
7979

8080
datasheet:
8181
class: closed
@@ -86,15 +86,15 @@ datasheet:
8686
package:
8787
class: partial
8888
link:
89-
notes: No specific package provided by the authors but integrates well with many widely used packages
89+
notes: No specific package provided by the authors but integrates well with many widely used packages.
9090

9191
api:
9292
class: open
9393
link: platform.deepseek.com
94-
notes: Available through various APIs
94+
notes: Available through various APIs.
9595
metaprompt: closed
9696

9797
licenses:
9898
class: partial
9999
link: https://huggingface.co/deepseek-ai/DeepSeek-V3-Base/blob/main/LICENSE-MODEL
100-
notes: MIT License and a separate Model License
100+
notes: MIT License and a separate Model License.

0 commit comments

Comments
 (0)