|
| 1 | +--- |
| 2 | +# Thank you for contributing! |
| 3 | +# In filling out this yaml file, please follow the criteria as described here: |
| 4 | +# https://github.com/opening-up-chatgpt/opening-up-chatgpt.github.io/tree/main/projects#criteria |
| 5 | + |
| 6 | +# You're free to build on this work and reuse the data. It is licensed under CC-BY 4.0, with the |
| 7 | +# stipulation that attribution should come in the form of a link to http://opening-up-chatgpt.github.io |
| 8 | +# and a citation to the paper in which the initial dataset & criteria were published: |
| 9 | + |
| 10 | +# Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In CUI '23: Proceedings of the 5th International Conference on Conversational User Interfaces. July 19-21, Eindhoven. doi: 10.1145/3571884.3604316 |
| 11 | + |
| 12 | +system: |
| 13 | + name: DeepSeek R1 |
| 14 | + link: https://github.com/deepseek-ai/DeepSeek-R1 |
| 15 | + type: text |
| 16 | + performanceclass: |
| 17 | + basemodelname: DeepSeek-V3-Base |
| 18 | + endmodelname: DeepSeek-R1 |
| 19 | + endmodellicense: DeepSeek License Agreement |
| 20 | + releasedate: 2025-01 |
| 21 | + notes: |
| 22 | + |
| 23 | +org: |
| 24 | + name: DeepSeek |
| 25 | + link: https://www.deepseek.com/ |
| 26 | + notes: "DeepSeek (深度求索), founded in 2023, is a Chinese company dedicated to making AGI a reality." |
| 27 | + |
| 28 | +# availability: |
| 29 | +datasources_basemodel: |
| 30 | + class: closed |
| 31 | + link: |
| 32 | + notes: Pretraining data not specified or documented besides the claim that it amounts to 14.8T tokens in Chinese and English. Proprietary dataset. |
| 33 | + |
| 34 | +datasources_endmodel: |
| 35 | + class: closed |
| 36 | + link: |
| 37 | + notes: Post-training data not specified or documented. Said to include '1.5M instances'. |
| 38 | + |
| 39 | +weights_basemodel: |
| 40 | + class: open |
| 41 | + link: https://huggingface.co/deepseek-ai/DeepSeek-V3-Base |
| 42 | + notes: |
| 43 | + |
| 44 | +weights_endmodel: |
| 45 | + class: open |
| 46 | + link: https://huggingface.co/deepseek-ai/DeepSeek-R1 |
| 47 | + notes: |
| 48 | + |
| 49 | +trainingcode: |
| 50 | + class: closed |
| 51 | + link: https://github.com/deepseek-ai/DeepSeek-R1 |
| 52 | + notes: Repository does not contain code. |
| 53 | + |
| 54 | +# documentation: |
| 55 | +code: |
| 56 | + class: closed |
| 57 | + link: https://github.com/deepseek-ai/DeepSeek-V3/ |
| 58 | + notes: No code, so no documentation. |
| 59 | + |
| 60 | +architecture: |
| 61 | + class: partial |
| 62 | + link: https://arxiv.org/pdf/2412.19437 |
| 63 | + notes: Model architecture described in a fair bit of detail in base model technical report. |
| 64 | + |
| 65 | +preprint: |
| 66 | + class: open |
| 67 | + link: https://arxiv.org/pdf/2501.12948 |
| 68 | + notes: Paper provides information about techniques used for constructing model, but nothing on pre- or post-training data. |
| 69 | + |
| 70 | +paper: |
| 71 | + class: closed |
| 72 | + link: |
| 73 | + notes: No peer-reviewed paper found. |
| 74 | + |
| 75 | +modelcard: |
| 76 | + class: partial |
| 77 | + link: https://huggingface.co/deepseek-ai/DeepSeek-V3-Base |
| 78 | + notes: Model card on HF provides basic summary but no info on intended downstream uses, scope, biases, risks and limitations. |
| 79 | + |
| 80 | +datasheet: |
| 81 | + class: closed |
| 82 | + link: |
| 83 | + notes: No datasheet provided. |
| 84 | + |
| 85 | +# access: |
| 86 | +package: |
| 87 | + class: partial |
| 88 | + link: |
| 89 | + notes: No specific package provided by the authors but integrates well with many widely used packages. |
| 90 | + |
| 91 | +api: |
| 92 | + class: open |
| 93 | + link: platform.deepseek.com |
| 94 | + notes: Available through various APIs. |
| 95 | + metaprompt: closed |
| 96 | + |
| 97 | +licenses: |
| 98 | + class: partial |
| 99 | + link: https://huggingface.co/deepseek-ai/DeepSeek-V3-Base/blob/main/LICENSE-MODEL |
| 100 | + notes: MIT License and a separate Model License. Derived model (R1) claims to be fully licensed through MIT. |
0 commit comments