Skip to content

Commit 3014206

Browse files
committed
Merge branch 'releases/2.0.0-rc1' of github.com:argilla-io/argilla into releases/2.0.0-rc1
2 parents 5d85cf6 + bfa611e commit 3014206

File tree

1 file changed

+19
-19
lines changed

1 file changed

+19
-19
lines changed

argilla/docs/how_to_guides/migrate_from_legacy_datasets.md

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Migrate your legacy datasets to Argilla V2
22

3-
This guide will help you migrate task specific datasets to Argilla V2. These do not include the `FeedbackDataset` which is just an interim naming convention for the latest extensible dataset. Task specific datasets are datasets that are used for a specific task, such as text classification, token classification, etc. If you would like to learn about the backstory of SDK this migration, please refer to the [SDK migration blog post](https://argilla.io/blog/introducing-argilla-new-sdk/).
3+
This guide will help you migrate task specific datasets to Argilla V2. These do not include the `FeedbackDataset` which is just an interim naming convention for the latest extensible dataset. Task specific datasets are datasets that are used for a specific task, such as text classification, token classification, etc. If you would like to learn about the backstory of SDK this migration, please refer to the [SDK migration blog post](https://argilla.io/blog/introducing-argilla-new-sdk/).
44

55
!!! note
66
Legacy Datasets include: `DatasetForTextClassification`, `DatasetForTokenClassification`, and `DatasetForText2Text`.
@@ -13,11 +13,11 @@ To follow this guide, you will need to have the following prerequisites:
1313
- An argilla >=1.29 server instance running. If you don't have one, you can create one by following the [Argilla installation guide](../../getting_started/installation.md).
1414
- The `argilla` sdk package installed in your environment.
1515

16-
If your current legacy datasets are on a server with Argilla release after 1.29, you could chose to recreate your legacy datasets as new datasets on the same server. You could then upgrade the server to Argilla 2.0 and carry on working their. Your legacy datasets will not be visible on the new server, but they will remain in storage layers if you need to access them.
16+
If your current legacy datasets are on a server with Argilla release after 1.29, you could chose to recreate your legacy datasets as new datasets on the same server. You could then upgrade the server to Argilla 2.0 and carry on working their. Your legacy datasets will not be visible on the new server, but they will remain in storage layers if you need to access them.
1717

1818
## Steps
1919

20-
The guide will take you through three steps:
20+
The guide will take you through three steps:
2121

2222
1. **Retrieve the legacy dataset** from the Argilla V1 server using the new `argilla` package.
2323
2. **Define the new dataset** in the Argilla V2 format.
@@ -101,7 +101,7 @@ dataset.create()
101101

102102
```python
103103
dataset = client.datasets(name=dataset_name)
104-
104+
105105
if dataset.exists():
106106
dataset.delete()
107107
```
@@ -119,16 +119,16 @@ Here are a set of example functions to convert the records for single-label and
119119
""" This function maps a text classification record dictionary to the new Argilla record."""
120120
suggestions = []
121121
responses = []
122-
122+
123123
if prediction := data.get("prediction"):
124124
label, score = prediction[0].values()
125125
agent = data["prediction_agent"]
126126
suggestions.append(rg.Suggestion(question_name="label", value=label, score=score, agent=agent))
127-
127+
128128
if annotation := data.get("annotation"):
129129
user_id = users_by_name.get(data["annotation_agent"], current_user).id
130130
responses.append(rg.Response(question_name="label", value=annotation, user_id=user_id))
131-
131+
132132
vectors = (data.get("vectors") or {})
133133
return rg.Record(
134134
id=data["id"],
@@ -149,16 +149,16 @@ Here are a set of example functions to convert the records for single-label and
149149
""" This function maps a text classification record dictionary to the new Argilla record."""
150150
suggestions = []
151151
responses = []
152-
152+
153153
if prediction := data.get("prediction"):
154154
labels, scores = zip(*[(pred["label"], pred["score"]) for pred in prediction])
155155
agent = data["prediction_agent"]
156156
suggestions.append(rg.Suggestion(question_name="labels", value=labels, score=scores, agent=agent))
157-
157+
158158
if annotation := data.get("annotation"):
159159
user_id = users_by_name.get(data["annotation_agent"], current_user).id
160160
responses.append(rg.Response(question_name="label", value=annotation, user_id=user_id))
161-
161+
162162
vectors = data.get("vectors") or {}
163163
return rg.Record(
164164
id=data["id"],
@@ -171,24 +171,24 @@ Here are a set of example functions to convert the records for single-label and
171171
responses=responses,
172172
)
173173
```
174-
174+
175175
=== "For token classification"
176176

177177
```python
178178
def map_to_record_for_span(data: dict, users_by_name: dict, current_user: rg.User) -> rg.Record:
179179
""" This function maps a token classification record dictionary to the new Argilla record."""
180180
suggestions = []
181181
responses = []
182-
182+
183183
if prediction := data.get("prediction"):
184184
scores = [span["score"] for span in prediction]
185185
agent = data["prediction_agent"]
186186
suggestions.append(rg.Suggestion(question_name="spans", value=prediction, score=scores, agent=agent))
187-
187+
188188
if annotation := data.get("annotation"):
189189
user_id = users_by_name.get(data["annotation_agent"], current_user).id
190190
responses.append(rg.Response(question_name="spans", value=annotation, user_id=user_id))
191-
191+
192192
vectors = data.get("vectors") or {}
193193
return rg.Record(
194194
id=data["id"],
@@ -202,27 +202,27 @@ Here are a set of example functions to convert the records for single-label and
202202
responses=responses,
203203
)
204204
```
205-
205+
206206
=== "For Text generation"
207207

208208
```python
209209
def map_to_record_for_text_generation(data: dict, users_by_name: dict, current_user: rg.User) -> rg.Record:
210210
""" This function maps a text2text record dictionary to the new Argilla record."""
211211
suggestions = []
212212
responses = []
213-
213+
214214
if prediction := data.get("prediction"):
215215
first = prediction[0]
216216
agent = data["prediction_agent"]
217217
suggestions.append(
218218
rg.Suggestion(question_name="text_generation", value=first["text"], score=first["score"], agent=agent)
219219
)
220-
220+
221221
if annotation := data.get("annotation"):
222222
# From data[annotation]
223223
user_id = users_by_name.get(data["annotation_agent"], current_user).id
224224
responses.append(rg.Response(question_name="text_generation", value=annotation, user_id=user_id))
225-
225+
226226
vectors = (data.get("vectors") or {})
227227
return rg.Record(
228228
id=data["id"],
@@ -240,7 +240,7 @@ Here are a set of example functions to convert the records for single-label and
240240
The functions above depend on the `users_by_name` dictionary and the `current_user` object to assign responses to users, we need to load the existing users. You can retrieve the users from the Argilla V2 server and the current user as follows:
241241

242242
```python
243-
# For
243+
# For
244244
users_by_name = {user.username: user for user in client.users}
245245
current_user = client.me
246246
```

0 commit comments

Comments
 (0)