Skip to content

Commit ae9467a

Browse files
author
The TensorFlow Datasets Authors
committed
Automated documentation update.
PiperOrigin-RevId: 688047256
1 parent 645e29f commit ae9467a

File tree

8 files changed

+323
-129
lines changed

8 files changed

+323
-129
lines changed

docs/catalog/_toc.yaml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,9 @@ toc:
9999
title: yes_no
100100
title: Audio
101101
- section:
102+
- path: /datasets/catalog/ai2dcaption
103+
status: nightly
104+
title: ai2dcaption
102105
- path: /datasets/catalog/ogbg_molpcba
103106
title: ogbg_molpcba
104107
title: Biology
@@ -135,6 +138,8 @@ toc:
135138
- section:
136139
- path: /datasets/catalog/imagenet2012
137140
title: imagenet2012 (manual)
141+
- path: /datasets/catalog/imagenet2012_subset
142+
title: imagenet2012_subset (manual)
138143
- path: /datasets/catalog/webvid
139144
title: webvid (manual)
140145
title: Conditional image generation
@@ -169,6 +174,8 @@ toc:
169174
title: celeb_a_hq (manual)
170175
- path: /datasets/catalog/imagenet2012
171176
title: imagenet2012 (manual)
177+
- path: /datasets/catalog/imagenet2012_subset
178+
title: imagenet2012_subset (manual)
172179
title: Density estimation
173180
- section:
174181
- path: /datasets/catalog/universal_dependencies
@@ -247,6 +254,9 @@ toc:
247254
title: abstract_reasoning (manual)
248255
- path: /datasets/catalog/aflw2k3d
249256
title: aflw2k3d
257+
- path: /datasets/catalog/ai2dcaption
258+
status: nightly
259+
title: ai2dcaption
250260
- path: /datasets/catalog/bccd
251261
title: bccd
252262
- path: /datasets/catalog/beans
@@ -479,6 +489,8 @@ toc:
479489
title: i_naturalist2021
480490
- path: /datasets/catalog/imagenet2012
481491
title: imagenet2012 (manual)
492+
- path: /datasets/catalog/imagenet2012_subset
493+
title: imagenet2012_subset (manual)
482494
- path: /datasets/catalog/imagenet_resized
483495
title: imagenet_resized
484496
- path: /datasets/catalog/imagenet_sketch
@@ -541,6 +553,8 @@ toc:
541553
- section:
542554
- path: /datasets/catalog/imagenet2012
543555
title: imagenet2012 (manual)
556+
- path: /datasets/catalog/imagenet2012_subset
557+
title: imagenet2012_subset (manual)
544558
- path: /datasets/catalog/stanford_dogs
545559
title: stanford_dogs
546560
- path: /datasets/catalog/stl10
@@ -549,6 +563,8 @@ toc:
549563
- section:
550564
- path: /datasets/catalog/imagenet2012
551565
title: imagenet2012 (manual)
566+
- path: /datasets/catalog/imagenet2012_subset
567+
title: imagenet2012_subset (manual)
552568
- path: /datasets/catalog/imagenet_resized
553569
title: imagenet_resized
554570
- path: /datasets/catalog/oxford_iiit_pet
@@ -571,6 +587,8 @@ toc:
571587
title: clevr
572588
- path: /datasets/catalog/imagenet2012
573589
title: imagenet2012 (manual)
590+
- path: /datasets/catalog/imagenet2012_subset
591+
title: imagenet2012_subset (manual)
574592
- path: /datasets/catalog/oxford_flowers102
575593
title: oxford_flowers102
576594
- path: /datasets/catalog/stanford_dogs

docs/catalog/ai2dcaption.md

Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
<div itemscope itemtype="http://schema.org/Dataset">
2+
<div itemscope itemprop="includedInDataCatalog" itemtype="http://schema.org/DataCatalog">
3+
<meta itemprop="name" content="TensorFlow Datasets" />
4+
</div>
5+
<meta itemprop="name" content="ai2dcaption" />
6+
<meta itemprop="description" content="This dataset is primarily based off the AI2D Dataset (see [here](&#10; https://prior.allenai.org/projects/diagram-understanding)).&#10;&#10;See [Section 4.1](https://arxiv.org/pdf/2310.12128) of our paper for&#10; the AI2D-Caption dataset annotation process.&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load(&#x27;ai2dcaption&#x27;, split=&#x27;train&#x27;)&#10;for ex in ds.take(4):&#10; print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;&lt;img src=&quot;https://storage.googleapis.com/tfds-data/visualization/fig/ai2dcaption-1.0.0.png&quot; alt=&quot;Visualization&quot; width=&quot;500px&quot;&gt;&#10;&#10;" />
7+
<meta itemprop="url" content="https://www.tensorflow.org/datasets/catalog/ai2dcaption" />
8+
<meta itemprop="sameAs" content="https://huggingface.co/datasets/abhayzala/AI2D-Caption" />
9+
<meta itemprop="citation" content="@inproceedings{Zala2024DiagrammerGPT,&#10; author = {Abhay Zala and Han Lin and Jaemin Cho and Mohit Bansal},&#10; title = {DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning},&#10; year = {2024},&#10; booktitle = {COLM},&#10;}" />
10+
</div>
11+
12+
# `ai2dcaption`
13+
14+
15+
Note: This dataset was added recently and is only available in our
16+
`tfds-nightly` package
17+
<span class="material-icons" title="Available only in the tfds-nightly package">nights_stay</span>.
18+
19+
* **Description**:
20+
21+
This dataset is primarily based off the AI2D Dataset (see
22+
[here](https://prior.allenai.org/projects/diagram-understanding)).
23+
24+
See [Section 4.1](https://arxiv.org/pdf/2310.12128) of our paper for the
25+
AI2D-Caption dataset annotation process.
26+
27+
* **Homepage**:
28+
[https://huggingface.co/datasets/abhayzala/AI2D-Caption](https://huggingface.co/datasets/abhayzala/AI2D-Caption)
29+
30+
* **Source code**:
31+
[`tfds.datasets.ai2dcaption.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/ai2dcaption/ai2dcaption_dataset_builder.py)
32+
33+
* **Versions**:
34+
35+
* **`1.0.0`** (default): Initial release.
36+
37+
* **Download size**: `Unknown size`
38+
39+
* **Dataset size**: `2.01 GiB`
40+
41+
* **Auto-cached**
42+
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
43+
No
44+
45+
* **Splits**:
46+
47+
Split | Examples
48+
:-------------------------------- | -------:
49+
`'auditor_llm_training_examples'` | 30
50+
`'gpt4v'` | 4,903
51+
`'llava_15'` | 4,902
52+
`'planner_llm_training_examples'` | 30
53+
`'test'` | 75
54+
55+
* **Feature structure**:
56+
57+
```python
58+
FeaturesDict({
59+
'caption': Text(shape=(), dtype=string),
60+
'entities': Sequence({
61+
'bounds': BBoxFeature(shape=(4,), dtype=float32),
62+
'cat': ClassLabel(shape=(), dtype=int64, num_classes=10),
63+
'from': Text(shape=(), dtype=string),
64+
'id': Text(shape=(), dtype=string),
65+
'label': Text(shape=(), dtype=string),
66+
'to': Text(shape=(), dtype=string),
67+
'type': ClassLabel(shape=(), dtype=int64, num_classes=5),
68+
}),
69+
'image': Image(shape=(None, None, 3), dtype=uint8, description=The image of the diagram.),
70+
'image_filename': Text(shape=(), dtype=string),
71+
'layout': ClassLabel(shape=(), dtype=int64, num_classes=7),
72+
'relationships': Sequence(Text(shape=(), dtype=string)),
73+
'topic': ClassLabel(shape=(), dtype=int64, num_classes=4),
74+
})
75+
```
76+
77+
* **Feature documentation**:
78+
79+
| Feature | Class | Shape | Dtype | Description |
80+
| :-------------- | :------------- | :----------- | :------ | :--------------- |
81+
| | FeaturesDict | | | |
82+
| caption | Text | | string | |
83+
| entities | Sequence | | | |
84+
| entities/bounds | BBoxFeature | (4,) | float32 | |
85+
| entities/cat | ClassLabel | | int64 | |
86+
| entities/from | Text | | string | |
87+
| entities/id | Text | | string | |
88+
| entities/label | Text | | string | |
89+
| entities/to | Text | | string | |
90+
| entities/type | ClassLabel | | int64 | |
91+
| image | Image | (None, None, | uint8 | The image of the |
92+
: : : 3) : : diagram. :
93+
| image_filename | Text | | string | Image filename. |
94+
: : : : : e.g. "1337.png" :
95+
| layout | ClassLabel | | int64 | |
96+
| relationships | Sequence(Text) | (None,) | string | |
97+
| topic | ClassLabel | | int64 | |
98+
99+
* **Supervised keys** (See
100+
[`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):
101+
`None`
102+
103+
* **Figure**
104+
([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):
105+
106+
<img src="https://storage.googleapis.com/tfds-data/visualization/fig/ai2dcaption-1.0.0.png" alt="Visualization" width="500px">
107+
108+
* **Examples**
109+
([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):
110+
111+
<!-- mdformat off(HTML should not be auto-formatted) -->
112+
113+
{% framebox %}
114+
115+
<button id="displaydataframe">Display examples...</button>
116+
<div id="dataframecontent" style="overflow-x:auto"></div>
117+
<script>
118+
const url = "https://storage.googleapis.com/tfds-data/visualization/dataframe/ai2dcaption-1.0.0.html";
119+
const dataButton = document.getElementById('displaydataframe');
120+
dataButton.addEventListener('click', async () => {
121+
// Disable the button after clicking (dataframe loaded only once).
122+
dataButton.disabled = true;
123+
124+
const contentPane = document.getElementById('dataframecontent');
125+
try {
126+
const response = await fetch(url);
127+
// Error response codes don't throw an error, so force an error to show
128+
// the error message.
129+
if (!response.ok) throw Error(response.statusText);
130+
131+
const data = await response.text();
132+
contentPane.innerHTML = data;
133+
} catch (e) {
134+
contentPane.innerHTML =
135+
'Error loading examples. If the error persist, please open '
136+
+ 'a new issue.';
137+
}
138+
});
139+
</script>
140+
141+
{% endframebox %}
142+
143+
<!-- mdformat on -->
144+
145+
* **Citation**:
146+
147+
```
148+
@inproceedings{Zala2024DiagrammerGPT,
149+
author = {Abhay Zala and Han Lin and Jaemin Cho and Mohit Bansal},
150+
title = {DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning},
151+
year = {2024},
152+
booktitle = {COLM},
153+
}
154+
```
155+

docs/catalog/dolma.md

Lines changed: 39 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -33,16 +33,17 @@ Research
3333

3434
* **Download size**: `Unknown size`
3535

36-
* **Dataset size**: `Unknown size`
36+
* **Dataset size**: `9.61 TiB`
3737

3838
* **Auto-cached**
3939
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
40-
Unknown
40+
No
4141

4242
* **Splits**:
4343

44-
Split | Examples
45-
:---- | -------:
44+
Split | Examples
45+
:-------- | ------------:
46+
`'train'` | 3,403,336,408
4647

4748
* **Feature structure**:
4849

@@ -77,7 +78,40 @@ text | Text | | string |
7778

7879
* **Examples**
7980
([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):
80-
Missing.
81+
82+
<!-- mdformat off(HTML should not be auto-formatted) -->
83+
84+
{% framebox %}
85+
86+
<button id="displaydataframe">Display examples...</button>
87+
<div id="dataframecontent" style="overflow-x:auto"></div>
88+
<script>
89+
const url = "https://storage.googleapis.com/tfds-data/visualization/dataframe/dolma-1.0.0.html";
90+
const dataButton = document.getElementById('displaydataframe');
91+
dataButton.addEventListener('click', async () => {
92+
// Disable the button after clicking (dataframe loaded only once).
93+
dataButton.disabled = true;
94+
95+
const contentPane = document.getElementById('dataframecontent');
96+
try {
97+
const response = await fetch(url);
98+
// Error response codes don't throw an error, so force an error to show
99+
// the error message.
100+
if (!response.ok) throw Error(response.statusText);
101+
102+
const data = await response.text();
103+
contentPane.innerHTML = data;
104+
} catch (e) {
105+
contentPane.innerHTML =
106+
'Error loading examples. If the error persist, please open '
107+
+ 'a new issue.';
108+
}
109+
});
110+
</script>
111+
112+
{% endframebox %}
113+
114+
<!-- mdformat on -->
81115

82116
* **Citation**:
83117

0 commit comments

Comments
 (0)