Skip to content

Commit c0a5df9

Browse files
authored
Merge pull request #25 from Imageomics/feature/templates
Update template pages
2 parents 12f9e8d + 1fb0782 commit c0a5df9

8 files changed

+596
-10
lines changed

docs/wiki-guide/About-Templates.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Using Dataset and Model Card Templates
22

3-
We have Imageomics-specific versions of Hugging Face's Dataset and Model Card templates. These include guidance and examples for the various metadata sections, reference information for Hugging Face's particular flavor of markdown, and the Imageomics grant acknowledgment.
3+
We provide Dataset and Model Card templates for both Imageomics and ABC, adapted from Hugging Face's templates. The Imageomics and ABC templates include guidance and examples for the various metadata sections, reference information for Hugging Face's particular flavor of markdown, and the appropriate NSF & NSERC grant acknowledgment.
44

5-
To use the template for a new dataset or model repository on Hugging Face (HF), simply copy and paste the contents of the appropriate template ([Dataset Card](HF_DatasetCard_Template_mkdocs.md) or [Model Card](HF_ModelCard_Template_mkdocs.md)) into your `README.md` file.[^1]
5+
To use a template for a new dataset or model repository on Hugging Face (HF), simply copy and paste the contents of the appropriate template ([Dataset Card](HF_DatasetCard_Template_mkdocs.md) or [Model Card](HF_ModelCard_Template_mkdocs.md)) into your `README.md` file.[^1]
66
Then, follow the descriptions under each section to fill in the appropriate information. This is meant to be an iterative process throughout the life of your project, so do not worry if you cannot answer all parts at the beginning—that's to be expected!
77
[^1]: The templates can also be added to your repository thorugh the website user interface (UI): Navigate to the "Model/Dataset Card" tab on your repo, select "Create Model/Dataset Card", copy and paste the template contents into the `README.md` file, and add your content.
88

@@ -11,5 +11,5 @@ Then, follow the descriptions under each section to fill in the appropriate info
1111
If you have never filled out a dataset card before, or are unsure of how to find the answers to fill in the sections, we ran a [workshop](https://github.com/Imageomics/data-workshop-AH-2024) to help familiarize our members with this process. In particular, the portion where we walked through filling out part of a dataset card as we did exploratory data analysis (EDA) was recorded and is available on the [Imageomics YouTube Channel](https://www.youtube.com/@ImageomicsInstitute/videos). Read the [story of the workshop](https://github.com/Imageomics/data-workshop-AH-2024/#story-of-the-workshop) and clone the [repo](https://github.com/Imageomics/data-workshop-AH-2024) to follow along with the 1 hour and 15 minute lesson!
1212

1313
!!! note "Note"
14-
The Dataset and Model cards have incorporated some of Hugging Face's January 2024 updates (following their [Dataset Card Overhaul](https://github.com/huggingface/huggingface_hub/commit/6dd7ee829bd1b1216663a9993c1943c29b64690a)). It doesn't appear they will be updated more and we do not currently anticipate further large updates on our end as our overall template formats have diverged, but you may nevertheless wish to check HF for extra information or tagging updates ([HF Dataset Card](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md), [HF Model Card](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md)).
14+
The Dataset and Model cards have incorporated some of Hugging Face's January 2024 updates (following their [Dataset Card Overhaul](https://github.com/huggingface/huggingface_hub/commit/6dd7ee829bd1b1216663a9993c1943c29b64690a)). It doesn't appear they will be updated more and we do not currently anticipate further large updates on our end as our overall template formats have diverged. Nevertheless, you may wish to check HF for extra information or tagging updates ([HF Dataset Card](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md), [HF Model Card](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md)).
1515

Lines changed: 271 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,271 @@
1+
---
2+
license: cc0-1.0
3+
language:
4+
- en
5+
pretty_name:
6+
task_categories: # ex: image-classification, see key list at https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/pipelines.ts
7+
tags:
8+
- biology
9+
- image
10+
- animals
11+
- CV
12+
size_categories: # ex: n<1K, 1K<n<10K, 10K<n<100K, 100K<n<1M, ...
13+
---
14+
15+
<!--
16+
17+
NOTE: Add more tags (your particular animal, type of model and use-case, etc.).
18+
19+
As with your GitHub Project repo, it is important to choose an appropriate license for your dataset. The default license is [CC0](https://creativecommons.org/publicdomain/zero/1.0/) (public domain dedication, see [Dryad's explanation of why to use CC0](https://blog.datadryad.org/2023/05/30/good-data-practices-removing-barriers-to-data-reuse-with-cc0-licensing/)). Alongside the appropriate stakeholders (eg., your PI, co-authors), select a license that is [Open Source Initiative](https://opensource.org/licenses) (OSI) compliant.
20+
For more information on how to choose a license and why it matters, see [Choose A License](https://choosealicense.com) and [A Quick Guide to Software Licensing for the Scientist-Programmer](https://doi.org/10.1371/journal.pcbi.1002598) by A. Morin, et al.
21+
See the [ABC Global Center policy for licensing](https://docs.google.com/document/d/1SlITG-r7kdJB6C8f4FCJ9Z7o7ccwldZoSRJKjhRAWVA/edit#heading=h.c1sxg0wsiqru) for more information.
22+
23+
See more options for the above information by clicking "edit dataset card" on your repo.
24+
25+
Fill in as much information as you can at each location that says "More information needed".
26+
-->
27+
28+
<!--
29+
Image with caption (jpg or png):
30+
|![Figure #](https://huggingface.co/datasets/ABC-Center/<data-repo>/resolve/main/<filepath>)|
31+
|:--|
32+
|**Figure #.** [Image of <describe image>](https://huggingface.co/datasets/ABC-Center/<data-repo>/raw/main/<filepath>) <caption description>.|
33+
-->
34+
35+
<!--
36+
Notes on styling:
37+
38+
To render LaTex in your README, wrap the code in `\\(` and `\\)`. Example: \\(\frac{1}{2}\\)
39+
40+
Escape underscores ("_") with a "\". Example: image\_RGB
41+
-->
42+
43+
# Dataset Card for [dataset pretty_name]
44+
45+
<!-- Provide a quick summary of what the dataset is or can be used for. -->
46+
47+
## Dataset Details
48+
49+
### Dataset Description
50+
51+
- **Curated by:** list curators (authors for _data_ citation, moved up)
52+
- **Language(s) (NLP):** [More Information Needed]
53+
<!-- Provide the basic links for the dataset. These will show up on the sidebar to the right of your dataset card ("Curated by" too). -->
54+
- **Homepage:**
55+
- **Repository:** [related project repo]
56+
- **Paper:**
57+
58+
59+
<!-- Provide a longer summary of what this dataset is. -->
60+
[More Information Needed]
61+
62+
<!--This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1), and further altered to suit ABC Global Center needs.-->
63+
64+
65+
### Supported Tasks and Leaderboards
66+
[More Information Needed]
67+
68+
<!-- Provide benchmarking results -->
69+
70+
71+
## Dataset Structure
72+
73+
<!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. -->
74+
75+
<!-- Provide format of the dataset, ex:
76+
77+
```
78+
/dataset/
79+
<species_1>/
80+
<img_id 1>.png
81+
<img_id 2>.png
82+
...
83+
<img_id n>.png
84+
<species_2>/
85+
<img_id 1>.png
86+
<img_id 2>.png
87+
...
88+
<img_id n>.png
89+
...
90+
<species_N>/
91+
<img_id 1>.png
92+
<img_id 2>.png
93+
...
94+
<img_id n>.png
95+
metadata.csv
96+
```
97+
98+
-->
99+
100+
### Data Instances
101+
[More Information Needed]
102+
103+
<!--
104+
Describe data files
105+
106+
Ex: All images are named <img_id>.png, each within a folder named for the species. They are 1024 x 1024, and the color has been standardized using <link to color standardization package>.
107+
-->
108+
109+
### Data Fields
110+
[More Information Needed]
111+
<!--
112+
Describe the types of the data files or the columns in a CSV with metadata.
113+
114+
Ex:
115+
**metadata.csv**:
116+
- `img_id`: Unique identifier for the dataset.
117+
- `specimen_id`: ID of specimen in the image, provided by museum data source. There are multiple images of a single specimen.
118+
- `species`: Species of the specimen in the image. There are N different species of <genus> of <animal>.
119+
- `view`: View of the specimen in the image (e.g., `ventral` or `dorsal` OR `top` or `bottom`, etc.; specify options where reasonable).
120+
- `file_name`: Relative path to image from the root of the directory (`<species>/<img_id>.png`); allows for image to be displayed in the dataset viewer alongside its associated metadata.
121+
-->
122+
123+
### Data Splits
124+
[More Information Needed]
125+
<!--
126+
Give your train-test splits for benchmarking; could be as simple as "split is indicated by the `split` column in the metadata file: `train`, `val`, or `test`." Or perhaps this is just the training dataset and other datasets were used for testing (you may indicate which were used).
127+
-->
128+
129+
## Dataset Creation
130+
131+
### Curation Rationale
132+
[More Information Needed]
133+
<!-- Motivation for the creation of this dataset. For instance, what you intended to study and why that required curation of a new dataset (or if it's newly collected data and why the data was collected (intended use)), etc. -->
134+
135+
### Source Data
136+
137+
<!-- This section describes the source data (e.g., news text and headlines, social media posts, translated sentences, ...). As well as an original source it was created from (e.g., sampling from Zenodo records, compiling images from different aggregators, etc.) -->
138+
139+
#### Data Collection and Processing
140+
[More Information Needed]
141+
<!-- This section describes the data collection and processing process such as data selection criteria, filtering and normalization methods, re-sizing of images, tools and libraries used, etc.
142+
This is what _you_ did to it following collection from the original source; it will be overall processing if you collected the data initially.
143+
-->
144+
145+
#### Who are the source data producers?
146+
[More Information Needed]
147+
<!-- This section describes the people or systems who originally created the data.
148+
149+
Ex: This dataset is a collection of images taken of the butterfly collection housed at the Ohio State University Museum of Biological Diversity. The associated labels and metadata are the information provided with the collection from biologists that study butterflies and supplied the specimens to the museum.
150+
-->
151+
152+
153+
### Annotations
154+
<!--
155+
If the dataset contains annotations which are not part of the initial data collection, use this section to describe them.
156+
157+
Ex: We standardized the taxonomic labels provided by the various data sources to conform to a uniform 7-rank Linnean structure. (Then, under annotation process, describe how this was done: Our sources used different names for the same kingdom (both _Animalia_ and _Metazoa_), so we chose one for all (_Animalia_). -->
158+
159+
#### Annotation process
160+
[More Information Needed]
161+
<!-- This section describes the annotation process such as annotation tools used, the amount of data annotated, annotation guidelines provided to the annotators, interannotator statistics, annotation validation, etc. -->
162+
163+
#### Who are the annotators?
164+
[More Information Needed]
165+
<!-- This section describes the people or systems who created the annotations. -->
166+
167+
### Personal and Sensitive Information
168+
[More Information Needed]
169+
<!--
170+
For instance, if your data includes people or endangered species. -->
171+
172+
173+
## Considerations for Using the Data
174+
[More Information Needed]
175+
<!--
176+
Things to consider while working with the dataset. For instance, maybe there are hybrids and they are labeled in the `hybrid_stat` column, so to get a subset without hybrids, subset to all instances in the metadata file such that `hybrid_stat` is _not_ "hybrid".
177+
-->
178+
179+
### Bias, Risks, and Limitations
180+
[More Information Needed]
181+
<!-- This section is meant to convey both technical and sociotechnical limitations. Could also address misuse, malicious use, and uses that the dataset will not work well for.-->
182+
183+
<!-- For instance, if your data exhibits a long-tailed distribution (and why). -->
184+
185+
### Recommendations
186+
[More Information Needed]
187+
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
188+
189+
## Licensing Information
190+
[More Information Needed]
191+
192+
<!-- See notes at top of file about selecting a license.
193+
If you choose CC0: This dataset is dedicated to the public domain for the benefit of scientific pursuits. We ask that you cite the dataset and journal paper using the below citations if you make use of it in your research.
194+
195+
Be sure to note different licensing of images if they have a different license from the compilation.
196+
ex:
197+
"""
198+
The data (images and text) contain a variety of licensing restrictions mostly within the CC family. Each image and text in this dataset is provided under the least restrictive terms allowed by its licensing requirements as provided to us (i.e, we impose no additional restrictions past those specified by licenses in the license file).
199+
200+
EOL images contain a variety of licenses ranging from [CC0](https://creativecommons.org/publicdomain/zero/1.0/) to [CC BY-NC-SA](https://creativecommons.org/licenses/by-nc-sa/4.0/).
201+
For license and citation information by image, see our [license file](https://huggingface.co/datasets/imageomics/treeoflife-10m/blob/main/metadata/licenses.csv).
202+
203+
This dataset (the compilation) has been marked as dedicated to the public domain by applying the [CC0 Public Domain Waiver](https://creativecommons.org/publicdomain/zero/1.0/). However, images may be licensed under different terms (as noted above).
204+
"""
205+
-->
206+
207+
## Citation
208+
[More Information Needed]
209+
210+
**BibTeX:**
211+
<!--
212+
If you want to include BibTex, replace "<>"s with your info
213+
214+
**Data**
215+
```
216+
@misc{<ref_code>,
217+
author = {<author1 and author2>},
218+
title = {<title>},
219+
year = {<year>},
220+
url = {https://huggingface.co/datasets/ABC-Center/<dataset_name>},
221+
doi = {<doi once generated>},
222+
publisher = {Hugging Face}
223+
}
224+
```
225+
226+
-for an associated paper:
227+
**Paper**
228+
```
229+
@article{<ref_code>,
230+
title = {<title>},
231+
author = {<author1 and author2>},
232+
journal = {<journal_name>},
233+
year = <year>,
234+
url = {<DOI_URL>},
235+
doi = {<DOI>}
236+
}
237+
```
238+
-->
239+
240+
<!---
241+
If the data is modified from another source, add the following.
242+
243+
Please be sure to also cite the original data source(s):
244+
<citation>
245+
-->
246+
247+
248+
## Acknowledgements
249+
250+
This work was supported by the [AI and Biodiversity Change (ABC) Global Center](http://abcresearchcenter.org/), which is funded by the US National Science Foundation under [Award No. 2330423](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2330423&HistoricalAwards=false) and Natural Sciences and Engineering Research Council of Canada under [Award No. 585136](https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=782440). This dataset draws on research supported by the Social Sciences and Humanities Research Council. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, Natural Sciences and Engineering Research Council of Canada, or Social Sciences and Humanities Research Council.
251+
252+
Ce travail a été soutenu par le centre de recherche [AI and Biodiversity Change (ABC)](http://abcresearchcenter.org/), financé conjointement par la National Science Foundation des États-Unis ([Financement #2330423](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2330423&HistoricalAwards=false)) et par le Conseil de recherches en sciences naturelles et en génie du Canada ([Financement #85136](https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=782440)). Ce jeu de données repose également en partie sur des travaux de recherche financés par le Conseil de recherches en sciences humaines du Canada. Les opinions, conclusions ou recommandations exprimées dans ce document sont celles de(s) auteur(s) et ne reflètent pas nécessairement celles de la National Science Foundation, du Conseil de recherches en sciences naturelles et en génie du Canada, ou du Conseil de recherches en sciences humaines du Canada.
253+
254+
<!-- You may also want to credit the source of your data, i.e., if you went to a museum or nature preserve to collect it. -->
255+
256+
## Glossary
257+
258+
<!-- [optional] If relevant, include terms and calculations in this section that can help readers understand the dataset or dataset card. -->
259+
260+
## More Information
261+
262+
<!-- [optional] Any other relevant information that doesn't fit elsewhere. -->
263+
264+
## Dataset Card Authors
265+
266+
[More Information Needed]
267+
268+
## Dataset Card Contact
269+
270+
[More Information Needed--optional]
271+
<!-- Could include who to contact with questions, but this is also what the "Discussions" tab is for. -->

docs/wiki-guide/HF_DatasetCard_Template_Imageomics.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ NOTE: Add more tags (your particular animal, type of model and use-case, etc.).
1818
1919
As with your GitHub Project repo, it is important to choose an appropriate license for your dataset. The default license is [CC0](https://creativecommons.org/publicdomain/zero/1.0/) (public domain dedication, see [Dryad's explanation of why to use CC0](https://blog.datadryad.org/2023/05/30/good-data-practices-removing-barriers-to-data-reuse-with-cc0-licensing/)). Alongside the appropriate stakeholders (eg., your PI, co-authors), select a license that is [Open Source Initiative](https://opensource.org/licenses) (OSI) compliant.
2020
For more information on how to choose a license and why it matters, see [Choose A License](https://choosealicense.com) and [A Quick Guide to Software Licensing for the Scientist-Programmer](https://doi.org/10.1371/journal.pcbi.1002598) by A. Morin, et al.
21-
See the [Imageomics policy for licensing](https://docs.google.com/document/d/1SlITG-r7kdJB6C8f4FCJ9Z7o7ccwldZoSRJKjhRAWVA/edit#heading=h.c1sxg0wsiqru) for more information.
21+
See the [Imageomics policy for licensing](https://imageomics.github.io/Imageomics-guide/wiki-guide/Digital-products-release-licensing-policy/) for more information.
2222
2323
See more options for the above information by clicking "edit dataset card" on your repo.
2424
Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,21 @@
11
# Dataset Card Template
22

3-
Below is the **HF_DatasetCard_Template_Imageomics.md**. You can copy this content and paste it into a new Markdown file to create a new dataset card.
3+
Below are the Dataset Card templates for Imageomics and ABC. You can download or copy the appropriate dataset card content and paste it into a new Markdown file to create a README for your dataset.
44

5-
[Download Template](https://github.com/Imageomics/Imageomics-guide/raw/main/docs/wiki-guide/HF_DatasetCard_Template_Imageomics.md)
5+
<details>
6+
<summary>Imageomics</summary>
7+
</br>
8+
<b><a href="https://github.com/Imageomics/Imageomics-guide/blob/main/docs/wiki-guide/HF_DatasetCard_Template_Imageomics.md" target="_blank">Download template from GitHub</a></b>
69

710
{{ include_file_as_code("docs/wiki-guide/HF_DatasetCard_Template_Imageomics.md") }}
11+
12+
</details>
13+
14+
<details>
15+
<summary>ABC</summary>
16+
</br>
17+
<b><a href="https://github.com/Imageomics/Imageomics-guide/blob/main/docs/wiki-guide/HF_DatasetCard_Template_ABC.md" target="_blank">Download template from GitHub</a></b>
18+
19+
{{ include_file_as_code("docs/wiki-guide/HF_DatasetCard_Template_ABC.md") }}
20+
21+
</details>

0 commit comments

Comments
 (0)