Skip to content

Commit ae9d86c

Browse files
burtenshawpre-commit-ci[bot]sdiazlor
authored andcommitted
[DOCS] Add documentation for ImageField (#5448)
# Description <!-- Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change. --> Closes #<issue_number> **Type of change** <!-- Please delete options that are not relevant. Remember to title the PR according to the type of change --> - Bug fix (non-breaking change which fixes an issue) - New feature (non-breaking change which adds functionality) - Breaking change (fix or feature that would cause existing functionality to not work as expected) - Refactor (change restructuring the codebase without changing functionality) - Improvement (change adding some improvement to an existing functionality) - Documentation update **How Has This Been Tested** <!-- Please add some reference about how your feature has been tested. --> **Checklist** <!-- Please go over the list and make sure you've taken everything into account --> - I added relevant documentation - I followed the style guidelines of this project - I did a self-review of my code - I made corresponding changes to the documentation - I confirm My changes generate no new warnings - I have added tests that prove my fix is effective or that my feature works - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/) --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: sdiazlor <[email protected]>
1 parent 1485053 commit ae9d86c

File tree

12 files changed

+315
-380
lines changed

12 files changed

+315
-380
lines changed

argilla/docs/how_to_guides/dataset.md

Lines changed: 24 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -142,23 +142,36 @@ new_dataset.create()
142142

143143
### Fields
144144

145-
The fields in a dataset consist of one or more data items requiring annotation. Currently, Argilla only supports plain text and markdown through the `TextField`, though we plan to introduce additional field types in future updates.
145+
The fields in a dataset consist of one or more data items requiring annotation. Currently, Argilla supports plain text and markdown through the `TextField` and images through the `ImageField`, though we plan to introduce additional field types in future updates.
146146

147147
!!! note
148148
The order of the fields in the UI follows the order in which these are added to the fields attribute in the Python SDK.
149149

150150
> Check the [Field - Python Reference](../reference/argilla/settings/fields.md) to see the field classes in detail.
151151
152-
```python
153-
rg.TextField(
154-
name="text",
155-
title="Text",
156-
use_markdown=False,
157-
required=True,
158-
description="Field description",
159-
)
160-
```
161-
![TextField](../assets/images/how_to_guides/dataset/fields.png)
152+
=== "Text"
153+
154+
```python
155+
rg.TextField(
156+
name="text",
157+
title="Text",
158+
use_markdown=False,
159+
required=True,
160+
description="Field description",
161+
)
162+
```
163+
![TextField](../assets/images/how_to_guides/dataset/fields.png)
164+
165+
=== "Image"
166+
167+
```python
168+
rg.ImageField(
169+
name="image",
170+
title="Image",
171+
required=True,
172+
description="Field description",
173+
)
174+
```
162175

163176
### Questions
164177

argilla/docs/how_to_guides/image_field.md

Lines changed: 0 additions & 56 deletions
This file was deleted.

argilla/docs/how_to_guides/import_export.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,9 @@ The records alone can be exported from a dataset in Argilla. This is useful if
189189

190190
The records can be exported as a dictionary, a list of dictionaries, or a `Dataset` of the `datasets` package.
191191

192+
!!! note "With images"
193+
If your dataset includes images, the recommended approach for exporting records is to use the `to_datasets` method, which exports the images as rescaled PIL objects. With other methods, the images will be exported using the data URI schema.
194+
192195
=== "To a python dictionary"
193196

194197
Records can be exported from `Dataset.records` as a dictionary. The `to_dict` method can be used to export records as a dictionary. You can specify the orientation of the dictionary output. You can also decide if to flatten or not the dictionary.

argilla/docs/reference/argilla/datasets/dataset_records.md

Lines changed: 54 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ Records can also be updated using the `log` method with records that contain an
131131
metadata={"department": "toys"},
132132
id="2" # (1)
133133
),
134-
] # (1)
134+
]
135135

136136
dataset.records.log(records)
137137
```
@@ -149,7 +149,7 @@ Records can also be updated using the `log` method with records that contain an
149149
"metadata": {"department": "toys"},
150150
"id": "2" # (1)
151151
},
152-
] # (1)
152+
]
153153

154154
dataset.records.log(data)
155155
```
@@ -175,6 +175,7 @@ Records can also be updated using the `log` method with records that contain an
175175
)
176176

177177
```
178+
178179
1. The `id` field is required to identify the record to be updated. The `id` field must be unique for each record in the dataset. If the `id` field is not provided, the record will be added as a new record.
179180
2. Let's say that your data structure has keys `my_id` instead of `id`. You can use the `mapping` parameter to map the keys in the data structure to the fields in the dataset.
180181

@@ -193,6 +194,57 @@ Records can also be updated using the `log` method with records that contain an
193194
1. In this example, the Hugging Face dataset matches the Argilla dataset schema.
194195
2. The `uuid` key in the Hugging Face dataset corresponds to the `id` field in the Argilla dataset.
195196

197+
### Adding and updating records with images
198+
199+
Argilla datasets can contain image fields. You can add images to a dataset by passing the image to the record object as either a remote URL, a local path to an image file, or a PIL object. The field names must be defined as an `rg.ImageField` in the dataset's `Settings` object to be accepted. Images will be stored in the Argilla database and returned using the data URI schema.
200+
201+
!!! note "As PIL objects"
202+
To retrieve the images as rescaled PIL objects, you can use the `to_datasets` method when exporting the records, as shown in this [how-to guide](../../../how_to_guides/import_export.md).
203+
204+
=== "From a data structure with local file paths"
205+
206+
```python
207+
208+
import os
209+
210+
image_dir = "path/to/images"
211+
212+
data = [
213+
{
214+
"image": os.path.join(image_dir, "image1.jpg"), # (1)
215+
},
216+
{
217+
"image": os.path.join(image_dir, "image2.jpg"),
218+
},
219+
]
220+
221+
dataset.records.log(data)
222+
```
223+
224+
1. The image can be referenced as either a remote URL, a local file path, or a PIL object.
225+
226+
=== "From a Hugging Face dataset"
227+
228+
Hugging Face datasets can be passed directly to the `log` method. The image field must be defined as an `Image` in the dataset's features.
229+
230+
```python
231+
hf_dataset = load_dataset("ylecun/mnist", split="train[:100]")
232+
dataset.records.log(records=hf_dataset)
233+
```
234+
235+
If the image field is not defined as an `Image` in the dataset's features, you can cast the dataset to the correct schema before adding it to the Argilla dataset. This is only necessary if the image field is not defined as an `Image` in the dataset's features, and is not one of the supported image types by Argilla (URL, local path, or PIL object).
236+
237+
```python
238+
hf_dataset = load_dataset("<my_custom_dataset>") # (1)
239+
hf_dataset = hf_dataset.cast(
240+
features=Features({"image": Image(), "label": Value("string")}),
241+
)
242+
dataset.records.log(records=hf_dataset)
243+
```
244+
245+
1. In this example, the Hugging Face dataset matches the Argilla dataset schema but the image field is not defined as an `Image` in the dataset's features.
246+
247+
196248
### Iterating over records in a dataset
197249

198250
`Dataset.records` can be used to iterate over records in a dataset from the server. The records will be fetched in batches from the server::

argilla/docs/reference/argilla/records/records.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ dataset.records.log(
2323

2424
1. The Argilla dataset contains a field named `text` matching the key here.
2525

26-
To create records with image fields, pass the the remote url or data uri of the image to records object. The field names must be defined as an `rg.ImageField`in the dataset's `Settings` object to be accepted.
26+
To create records with image fields, pass the image to the record object as either a remote url, local path to an image file, or a PIL object. The field names must be defined as an `rg.ImageField`in the dataset's `Settings` object to be accepted. Images will be stored in the Argilla database and returned as rescaled PIL objects.
2727

2828
```python
2929
dataset.records.log(
@@ -35,8 +35,10 @@ dataset.records.log(
3535
)
3636
```
3737

38-
1. The image can be referenced as either a remote url or a data uri.
38+
1. The image can be referenced as either a remote url, a local file path, or a PIL object.
3939

40+
!!! note
41+
The image will be stored in the Argilla database and can impact the dataset's storage usage. Images should be less than 5mb in size and datasets should contain less than 10,000 images.
4042

4143
### Accessing Record Attributes
4244

argilla/docs/reference/argilla/settings/fields.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,3 +42,4 @@ data = rg.Dataset(
4242

4343

4444
::: src.argilla.settings._field.TextField
45+
::: src.argilla.settings._field.ImageField

0 commit comments

Comments
 (0)