You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Description
<!-- Please include a summary of the changes and the related issue.
Please also include relevant motivation and context. List any
dependencies that are required for this change. -->
Closes #<issue_number>
**Type of change**
<!-- Please delete options that are not relevant. Remember to title the
PR according to the type of change -->
- Bug fix (non-breaking change which fixes an issue)
- New feature (non-breaking change which adds functionality)
- Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- Refactor (change restructuring the codebase without changing
functionality)
- Improvement (change adding some improvement to an existing
functionality)
- Documentation update
**How Has This Been Tested**
<!-- Please add some reference about how your feature has been tested.
-->
**Checklist**
<!-- Please go over the list and make sure you've taken everything into
account -->
- I added relevant documentation
- I followed the style guidelines of this project
- I did a self-review of my code
- I made corresponding changes to the documentation
- I confirm My changes generate no new warnings
- I have added tests that prove my fix is effective or that my feature
works
- I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)
---------
Co-authored-by: Leire Aguirre <[email protected]>
Co-authored-by: José Francisco Calvo <[email protected]>
Co-authored-by: José Francisco Calvo <[email protected]>
Co-authored-by: Paco Aranda <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Damián Pumar <[email protected]>
Co-authored-by: Francisco Aranda <[email protected]>
Co-authored-by: burtenshaw <[email protected]>
Copy file name to clipboardExpand all lines: argilla/docs/getting_started/quickstart.md
+52-53Lines changed: 52 additions & 53 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -64,7 +64,7 @@ Argilla is a free, open-source, self-hosted tool. This means you need to deploy
64
64
65
65
If you want to **run Argilla locally on your machine or a server**, or tune the server configuration, choose this option. To use this option, [check this guide](how-to-deploy-argilla-with-docker.md).
66
66
67
-
## Sign in into the Argilla UI
67
+
## Sign in to the Argilla UI
68
68
69
69
If everything went well, you should see the Argilla sign in page that looks like this:
70
70
@@ -82,21 +82,47 @@ In the sign in page:
82
82
!!! info "Unauthorized error"
83
83
Sometimes, after authorizing you'll see an unauthorized error, and get redirected to the sign in page. Typically, clicking the Sign in button again will solve this issue.
84
84
85
-
Congrats! Your Argilla server is ready to start your first project using the Python SDK. You now have full rights to create datasets. Follow the instructions in the home page, or keep reading this guide if you want a more detailed explanation.
85
+
Congrats! Your Argilla server is ready to start your first project.
86
86
87
-
## Install the Python SDK
87
+
## Create your first dataset
88
88
89
-
To manage workspaces and datasets in Argilla, you need to use the Argilla Python SDK. You can install it with pip as follows:
89
+
The quickest way to start exploring the tool and create your first dataset is by importing an exiting one from the Hugging Face Hub.
90
90
91
-
```console
92
-
pip install argilla
93
-
```
91
+
To do this, log in to the Argilla UI and in the Home page click on "Import from Hub". You can choose one of the sample datasets or paste a repo id in the input. This will look something like `stanfordnlp/imdb`.
94
92
95
-
## Create your first dataset
93
+
Argilla will automatically interpret the columns in the dataset to map them to Fields and Questions.
94
+
95
+
**Fields** include the data that you want feedback on, like text, chats, or images. If you want to exclude any of the Fields that Argilla identified for you, simply select the "No mapping" option.
96
+
97
+
**Questions** are the feedback you want to collect, like labels, ratings, rankings, or text. If Argilla identified questions in your dataset that you don't want, you can eliminate them. You can also add questions of your own.
98
+
99
+

100
+
101
+
Note that you will be able to modify some elements of the configuration of the dataset after it has been created from the Dataset Settings page e.g., the titles of fields and questions. Check all the settings you can modify in the [Update a dataset](../how_to_guides/dataset.md#update-a-dataset) section.
102
+
103
+
When you're happy with the result, you'll need to give a name to your dataset, select a workspace and choose a split, if applicable. Then, Argilla will start importing the dataset in the background. Now you're all set up to start annotating!
104
+
105
+
!!! info "Importing long datasets"
106
+
Argilla will only import the first 10k rows of a dataset. If your dataset is larger, you can import the rest of the records at any point using the Python SDK.
107
+
108
+
To do that, open your dataset and copy the code snippet provided under "Import data". Now, open a Jupyter or Google Colab notebook and install argilla:
109
+
110
+
```python
111
+
!pip install argilla
112
+
```
113
+
Then, paste and run your code snippet. This will import the remaining records to your dataset.
114
+
115
+
## Install and connect the Python SDK
96
116
97
-
For getting started with Argilla and its SDK, we recommend to use Jupyter Notebook or Google Colab.
117
+
For getting started with Argilla and its SDK, we recommend to use Jupyter Notebook or Google Colab. You will need this to manage users, workspaces and datasets in Argilla.
98
118
99
-
To start interacting with your Argilla server, you need to create a instantiate a client with an API key and API URL:
119
+
In your notebook, you can install the Argilla SDK with pip as follows:
120
+
121
+
```python
122
+
!pip install argilla
123
+
```
124
+
125
+
To start interacting with your Argilla server, you need to instantiate a client with an API key and API URL:
100
126
101
127
- The `<api_key>` is in the `My Settings` page of your Argilla Space but make sure you are logged in with the `owner` account you used to create the Space.
102
128
@@ -112,65 +138,38 @@ client = rg.Argilla(
112
138
```
113
139
114
140
!!! info "You can't find your API URL"
115
-
If you're using Spaces, sometimes the Argilla UI is embedded into the Hub UI so the URL of the browser won't match the API URL. In these scenarios, there are two options:
116
-
1. Click on the three points menu at the top of the Space, select "Embed this Space", and open the direct URL.
117
-
2. Use this pattern: `https://[your-owner-name]-[your_space_name].hf.space`.
141
+
If you're using Spaces, sometimes the Argilla UI is embedded into the Hub UI so the URL of the browser won't match the API URL. In these scenarios, you have several options:
142
+
143
+
1. In the Home page of Argilla, click on "Import from the SDK". You will find your API URL and key in the code snippet provided.
144
+
2. Click on the three points menu at the top of the Space, select "Embed this Space", and open the direct URL.
145
+
3. Use this pattern: `https://[your-owner-name]-[your_space_name].hf.space`.
118
146
119
-
To create a dataset with a simple text classification task, first, you need to **define the dataset settings**.
147
+
To check that everything is running correctly, you can call `me`. This should return your user information:
120
148
121
149
```python
122
-
settings = rg.Settings(
123
-
guidelines="Classify the reviews as positive or negative.",
124
-
fields=[
125
-
rg.TextField(
126
-
name="review",
127
-
title="Text from the review",
128
-
use_markdown=False,
129
-
),
130
-
],
131
-
questions=[
132
-
rg.LabelQuestion(
133
-
name="my_label",
134
-
title="In which category does this article fit?",
135
-
labels=["positive", "negative"],
136
-
)
137
-
],
138
-
)
150
+
client.me
139
151
```
140
152
141
-
Now you can **create the dataset with these settings**. Publish the dataset to make it available in the UI and add the records.
153
+
From here, you can manage all of your assets in Argilla, including updating the dataset we created earlier and adding advanced information, such as vectors, metadata or suggestions. To learn how to do this, check our [how to guides](../how_to_guides/index.md).
142
154
143
-
!!! info "About workspaces"
144
-
Workspaces in Argilla group datasets and user access rights. The `workspace` parameter is optional in this case. If you don't specify it, the dataset will be created in the default workspace `argilla`.
155
+
## Export your dataset to the Hub
145
156
146
-
By default, **this workspace will be visible to users joining with the Sign in with Hugging Face button**. You can create other workspaces and decide to grant access to users either with the SDK or the [changing the OAuth configuration](how-to-configure-argilla-on-huggingface.md).
157
+
Once you've spent some time annotating your dataset in Argilla, you can upload it back to the Hugging Face Hub to share with others or version control it.
147
158
148
-
```python
149
-
dataset = rg.Dataset(
150
-
name=f"my_first_dataset",
151
-
settings=settings,
152
-
client=client,
153
-
#workspace="argilla"
154
-
)
155
-
dataset.create()
156
-
```
157
-
158
-
Now you can **add records to your dataset**. We will use the IMDB dataset from the Hugging Face Datasets library as an example. The `mapping` parameter indicates which keys/columns in the source dataset correspond to the Argilla dataset fields.
159
+
To do that, first follow the steps in the previous section to connect to your Argilla server using the SDK. Then, you can load your dataset and export it to the hub like this:
159
160
160
161
```python
161
-
from datasets import load_dataset
162
-
163
-
data = load_dataset("imdb", split="train[:100]").to_list()
0 commit comments