Skip to content

Commit 78575e6

Browse files
davidberenstein1957frascuchonpre-commit-ci[bot]
authored
docs: 5411 docs update migrating to 20 flow 2 (#5430)
# Description <!-- Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change. --> Extended user and workspace merge v2. Closes #5411 **Type of change** <!-- Please delete options that are not relevant. Remember to title the PR according to the type of change --> - Documentation update **How Has This Been Tested** <!-- Please add some reference about how your feature has been tested. --> **Checklist** <!-- Please go over the list and make sure you've taken everything into account --> - I added relevant documentation - I followed the style guidelines of this project - I did a self-review of my code - I made corresponding changes to the documentation - I confirm My changes generate no new warnings - I have added tests that prove my fix is effective or that my feature works - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/) --------- Co-authored-by: Paco Aranda <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent c03edfb commit 78575e6

File tree

3 files changed

+90
-21
lines changed

3 files changed

+90
-21
lines changed

argilla/docs/how_to_guides/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ These guides provide step-by-step instructions for common scenarios, including d
9494

9595
---
9696

97-
Learn how to migrate your legacy datasets from Argilla 1.x to 2.x.
97+
Learn how to migrate users, workspaces and datasets from Argilla V1 to V2.
9898

9999
[:octicons-arrow-right-24: How-to guide](migrate_from_legacy_datasets.md)
100100

argilla/docs/how_to_guides/migrate_from_legacy_datasets.md

Lines changed: 88 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,101 @@
1-
# Migrate your legacy datasets to Argilla V2
1+
# Migrate users, workspaces and datasets to Argilla 2.x
22

3-
This guide will help you migrate task specific datasets to Argilla V2. These do not include the `FeedbackDataset` which is just an interim naming convention for the latest extensible dataset. Task specific datasets are datasets that are used for a specific task, such as text classification, token classification, etc. If you would like to learn about the backstory of SDK this migration, please refer to the [SDK migration blog post](https://argilla.io/blog/introducing-argilla-new-sdk/).
3+
This guide will help you migrate task to Argilla V2. These do not include the `FeedbackDataset` which is just an interim naming convention for the latest extensible dataset. Task-specific datasets are datasets that are used for a specific task, such as text classification, token classification, etc. If you would like to learn about the backstory of SDK this migration, please refer to the [SDK migration blog post](https://argilla.io/blog/introducing-argilla-new-sdk/). Additionally, we will provide guidance on how to maintain your `User`'s and `Workspace`'s within the new Argilla V2 format.
44

55
!!! note
6-
Legacy Datasets include: `DatasetForTextClassification`, `DatasetForTokenClassification`, and `DatasetForText2Text`.
6+
Legacy datasets include: `DatasetForTextClassification`, `DatasetForTokenClassification`, and `DatasetForText2Text`.
7+
8+
`FeedbackDataset`'s do not need to be migrated as they are already in the Argilla V2 format. Anyway, since the 2.x version includes changes to the search index structure, you should reindex the datasets by enabling the docker environment variable REINDEX_DATASET (This step is automatically executed if you're running Argilla in an HF Space). See the [server configuration docs](../reference/argilla-server/configuration.md#docker-images-only) section for more details.
79

8-
`FeedbackDataset`'s do not need to be migrated as they are already in the Argilla V2 format.
910

1011
To follow this guide, you will need to have the following prerequisites:
1112

1213
- An argilla 1.* server instance running with legacy datasets.
1314
- An argilla >=1.29 server instance running. If you don't have one, you can create one by following this [Argilla guide](../getting_started/quickstart.md).
1415
- The `argilla` sdk package installed in your environment.
1516

17+
!!! warning
18+
This guide will recreate all `User`'s' and `Workspace`'s' on a new server. Hence, they will be created with new passwords and IDs. If you want to keep the same passwords and IDs, you can can copy the datasets to a temporary v2 instance, then upgrade your current instance to v2.0 and copy the datasets back to your original instance after.
19+
1620
If your current legacy datasets are on a server with Argilla release after 1.29, you could chose to recreate your legacy datasets as new datasets on the same server. You could then upgrade the server to Argilla 2.0 and carry on working their. Your legacy datasets will not be visible on the new server, but they will remain in storage layers if you need to access them.
1721

18-
## Steps
22+
For migrating the guides you will need to install the new `argilla` package. This includes a new `v1` module that allows you to connect to the Argilla V1 server.
23+
24+
```bash
25+
pip install "argilla>=2.0.0"
26+
```
27+
28+
## Migrate Users and Workspaces
29+
30+
The guide will take you through two steps:
31+
32+
1. **Retrieve the old users and workspaces** from the Argilla V1 server using the new `argilla` package.
33+
2. **Recreate the users and workspaces** on the Argilla V2 server based op `name` as unique identifier.
34+
35+
### Step 1: Retrieve the old users and workspaces
36+
37+
You can use the `v1` module to connect to the Argilla V1 server.
38+
39+
```python
40+
import argilla.v1 as rg_v1
41+
42+
# Initialize the API with an Argilla server less than 2.0
43+
api_url = "<your-url>"
44+
api_key = "<your-api-key>"
45+
rg_v1.init(api_url, api_key)
46+
```
47+
48+
Next, load the dataset `User` and `Workspaces` and from the Argilla V1 server:
49+
50+
```python
51+
users_v1 = rg_v1.User.list()
52+
workspaces_v1 = rg_v1.Workspace.list()
53+
```
54+
55+
### Step 2: Recreate the users and workspaces
56+
57+
To recreate the users and workspaces on the Argilla V2 server, you can use the `argilla` package.
58+
59+
First, instantiate the `Argilla` class to connect to the Argilla V2 server:
60+
61+
```python
62+
import argilla as rg
63+
64+
client = rg.Argilla()
65+
```
66+
67+
Next, recreate the users and workspaces on the Argilla V2 server:
68+
69+
```python
70+
for workspace in workspaces_v1:
71+
rg.Workspace(
72+
name=workspace.name
73+
).create()
74+
```
75+
76+
```python
77+
for user in users_v1:
78+
user = rg.User(
79+
username=user.username,
80+
first_name=user.first_name,
81+
last_name=user.last_name,
82+
role=user.role,
83+
password="<your_chosen_password>" # (1)
84+
).create()
85+
if user.role == "owner":
86+
continue
87+
88+
for workspace_name in user.workspaces:
89+
if workspace_name != user.name:
90+
workspace = client.workspaces(name=workspace_name)
91+
user.add_to_workspace(workspace)
92+
```
93+
94+
1. You need to chose a new password for the user, to do this programmatically you can use the `uuid` package to generate a random password. Take care to keep track of the passwords you chose, since you will not be able to retrieve them later.
95+
96+
Now you have successfully migrated your users and workspaces to Argilla V2 and can continue with the next steps.
97+
98+
## Migrate datasets
1999

20100
The guide will take you through three steps:
21101

@@ -25,12 +105,7 @@ The guide will take you through three steps:
25105

26106
### Step 1: Retrieve the legacy dataset
27107

28-
Connect to the Argilla V1 server via the new `argilla` package. First, you should install an extra dependency:
29-
```bash
30-
pip install "argilla[legacy]"
31-
```
32-
33-
Now, you can use the `v1` module to connect to the Argilla V1 server.
108+
You can use the `v1` module to connect to the Argilla V1 server.
34109

35110
```python
36111
import argilla.v1 as rg_v1
@@ -88,9 +163,7 @@ Next, define the new dataset settings:
88163
```
89164

90165
1. The default field in `DatasetForTextClassification` is `text`, but make sure you provide all fields included in `record.inputs`.
91-
92166
2. Make sure you provide all relevant metadata fields available in the dataset.
93-
94167
3. Make sure you provide all relevant vectors available in the dataset.
95168

96169
=== "For multi-label classification"
@@ -113,9 +186,7 @@ Next, define the new dataset settings:
113186
```
114187

115188
1. The default field in `DatasetForTextClassification` is `text`, but we should provide all fields included in `record.inputs`.
116-
117189
2. Make sure you provide all relevant metadata fields available in the dataset.
118-
119190
3. Make sure you provide all relevant vectors available in the dataset.
120191

121192
=== "For token classification"
@@ -138,7 +209,6 @@ Next, define the new dataset settings:
138209
```
139210

140211
1. Make sure you provide all relevant metadata fields available in the dataset.
141-
142212
2. Make sure you provide all relevant vectors available in the dataset.
143213

144214
=== "For text generation"
@@ -161,21 +231,20 @@ Next, define the new dataset settings:
161231
```
162232

163233
1. We should provide all relevant metadata fields available in the dataset.
164-
165234
2. We should provide all relevant vectors available in the dataset.
166235

167236
Finally, create the new dataset on the Argilla V2 server:
168237

169238
```python
170-
dataset = rg.Dataset(name=dataset_name, settings=settings)
239+
dataset = rg.Dataset(name=dataset_name, workspace=workspace, settings=settings)
171240
dataset.create()
172241
```
173242

174243
!!! note
175244
If a dataset with the same name already exists, the `create` method will raise an exception. You can check if the dataset exists and delete it before creating a new one.
176245

177246
```python
178-
dataset = client.datasets(name=dataset_name)
247+
dataset = client.datasets(name=dataset_name, workspace=workspace)
179248

180249
if dataset is not None:
181250
dataset.delete()

argilla/mkdocs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@ nav:
171171
- Import and export datasets: how_to_guides/import_export.md
172172
- Advanced:
173173
- Use Markdown to format rich content: how_to_guides/use_markdown_to_format_rich_content.md
174-
- Migrate your legacy datasets to Argilla V2: how_to_guides/migrate_from_legacy_datasets.md
174+
- Migrate users, workspaces and datasets to Argilla V2: how_to_guides/migrate_from_legacy_datasets.md
175175
- Tutorials:
176176
- tutorials/index.md
177177
- Text classification: tutorials/text_classification.ipynb

0 commit comments

Comments
 (0)