Skip to content

Commit e8d1d22

Browse files
nataliaElvleiyrejfcalvofrascuchonpre-commit-ci[bot]
authored
Import from hub docs (#5631)
# Description <!-- Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change. --> Closes #<issue_number> **Type of change** <!-- Please delete options that are not relevant. Remember to title the PR according to the type of change --> - Bug fix (non-breaking change which fixes an issue) - New feature (non-breaking change which adds functionality) - Breaking change (fix or feature that would cause existing functionality to not work as expected) - Refactor (change restructuring the codebase without changing functionality) - Improvement (change adding some improvement to an existing functionality) - Documentation update **How Has This Been Tested** <!-- Please add some reference about how your feature has been tested. --> **Checklist** <!-- Please go over the list and make sure you've taken everything into account --> - I added relevant documentation - I followed the style guidelines of this project - I did a self-review of my code - I made corresponding changes to the documentation - I confirm My changes generate no new warnings - I have added tests that prove my fix is effective or that my feature works - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/) --------- Co-authored-by: Leire Aguirre <[email protected]> Co-authored-by: José Francisco Calvo <[email protected]> Co-authored-by: José Francisco Calvo <[email protected]> Co-authored-by: Paco Aranda <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Damián Pumar <[email protected]> Co-authored-by: Francisco Aranda <[email protected]> Co-authored-by: burtenshaw <[email protected]>
1 parent 2247283 commit e8d1d22

File tree

2 files changed

+52
-53
lines changed

2 files changed

+52
-53
lines changed
442 KB
Loading

argilla/docs/getting_started/quickstart.md

Lines changed: 52 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ Argilla is a free, open-source, self-hosted tool. This means you need to deploy
6464

6565
If you want to **run Argilla locally on your machine or a server**, or tune the server configuration, choose this option. To use this option, [check this guide](how-to-deploy-argilla-with-docker.md).
6666

67-
## Sign in into the Argilla UI
67+
## Sign in to the Argilla UI
6868

6969
If everything went well, you should see the Argilla sign in page that looks like this:
7070

@@ -82,21 +82,47 @@ In the sign in page:
8282
!!! info "Unauthorized error"
8383
Sometimes, after authorizing you'll see an unauthorized error, and get redirected to the sign in page. Typically, clicking the Sign in button again will solve this issue.
8484

85-
Congrats! Your Argilla server is ready to start your first project using the Python SDK. You now have full rights to create datasets. Follow the instructions in the home page, or keep reading this guide if you want a more detailed explanation.
85+
Congrats! Your Argilla server is ready to start your first project.
8686

87-
## Install the Python SDK
87+
## Create your first dataset
8888

89-
To manage workspaces and datasets in Argilla, you need to use the Argilla Python SDK. You can install it with pip as follows:
89+
The quickest way to start exploring the tool and create your first dataset is by importing an exiting one from the Hugging Face Hub.
9090

91-
```console
92-
pip install argilla
93-
```
91+
To do this, log in to the Argilla UI and in the Home page click on "Import from Hub". You can choose one of the sample datasets or paste a repo id in the input. This will look something like `stanfordnlp/imdb`.
9492

95-
## Create your first dataset
93+
Argilla will automatically interpret the columns in the dataset to map them to Fields and Questions.
94+
95+
**Fields** include the data that you want feedback on, like text, chats, or images. If you want to exclude any of the Fields that Argilla identified for you, simply select the "No mapping" option.
96+
97+
**Questions** are the feedback you want to collect, like labels, ratings, rankings, or text. If Argilla identified questions in your dataset that you don't want, you can eliminate them. You can also add questions of your own.
98+
99+
![Screenshot of the dataset configuration page](../assets/images/getting_started/dataset_configurator.png)
100+
101+
Note that you will be able to modify some elements of the configuration of the dataset after it has been created from the Dataset Settings page e.g., the titles of fields and questions. Check all the settings you can modify in the [Update a dataset](../how_to_guides/dataset.md#update-a-dataset) section.
102+
103+
When you're happy with the result, you'll need to give a name to your dataset, select a workspace and choose a split, if applicable. Then, Argilla will start importing the dataset in the background. Now you're all set up to start annotating!
104+
105+
!!! info "Importing long datasets"
106+
Argilla will only import the first 10k rows of a dataset. If your dataset is larger, you can import the rest of the records at any point using the Python SDK.
107+
108+
To do that, open your dataset and copy the code snippet provided under "Import data". Now, open a Jupyter or Google Colab notebook and install argilla:
109+
110+
```python
111+
!pip install argilla
112+
```
113+
Then, paste and run your code snippet. This will import the remaining records to your dataset.
114+
115+
## Install and connect the Python SDK
96116

97-
For getting started with Argilla and its SDK, we recommend to use Jupyter Notebook or Google Colab.
117+
For getting started with Argilla and its SDK, we recommend to use Jupyter Notebook or Google Colab. You will need this to manage users, workspaces and datasets in Argilla.
98118

99-
To start interacting with your Argilla server, you need to create a instantiate a client with an API key and API URL:
119+
In your notebook, you can install the Argilla SDK with pip as follows:
120+
121+
```python
122+
!pip install argilla
123+
```
124+
125+
To start interacting with your Argilla server, you need to instantiate a client with an API key and API URL:
100126

101127
- The `<api_key>` is in the `My Settings` page of your Argilla Space but make sure you are logged in with the `owner` account you used to create the Space.
102128

@@ -112,65 +138,38 @@ client = rg.Argilla(
112138
```
113139

114140
!!! info "You can't find your API URL"
115-
If you're using Spaces, sometimes the Argilla UI is embedded into the Hub UI so the URL of the browser won't match the API URL. In these scenarios, there are two options:
116-
1. Click on the three points menu at the top of the Space, select "Embed this Space", and open the direct URL.
117-
2. Use this pattern: `https://[your-owner-name]-[your_space_name].hf.space`.
141+
If you're using Spaces, sometimes the Argilla UI is embedded into the Hub UI so the URL of the browser won't match the API URL. In these scenarios, you have several options:
142+
143+
1. In the Home page of Argilla, click on "Import from the SDK". You will find your API URL and key in the code snippet provided.
144+
2. Click on the three points menu at the top of the Space, select "Embed this Space", and open the direct URL.
145+
3. Use this pattern: `https://[your-owner-name]-[your_space_name].hf.space`.
118146

119-
To create a dataset with a simple text classification task, first, you need to **define the dataset settings**.
147+
To check that everything is running correctly, you can call `me`. This should return your user information:
120148

121149
```python
122-
settings = rg.Settings(
123-
guidelines="Classify the reviews as positive or negative.",
124-
fields=[
125-
rg.TextField(
126-
name="review",
127-
title="Text from the review",
128-
use_markdown=False,
129-
),
130-
],
131-
questions=[
132-
rg.LabelQuestion(
133-
name="my_label",
134-
title="In which category does this article fit?",
135-
labels=["positive", "negative"],
136-
)
137-
],
138-
)
150+
client.me
139151
```
140152

141-
Now you can **create the dataset with these settings**. Publish the dataset to make it available in the UI and add the records.
153+
From here, you can manage all of your assets in Argilla, including updating the dataset we created earlier and adding advanced information, such as vectors, metadata or suggestions. To learn how to do this, check our [how to guides](../how_to_guides/index.md).
142154

143-
!!! info "About workspaces"
144-
Workspaces in Argilla group datasets and user access rights. The `workspace` parameter is optional in this case. If you don't specify it, the dataset will be created in the default workspace `argilla`.
155+
## Export your dataset to the Hub
145156

146-
By default, **this workspace will be visible to users joining with the Sign in with Hugging Face button**. You can create other workspaces and decide to grant access to users either with the SDK or the [changing the OAuth configuration](how-to-configure-argilla-on-huggingface.md).
157+
Once you've spent some time annotating your dataset in Argilla, you can upload it back to the Hugging Face Hub to share with others or version control it.
147158

148-
```python
149-
dataset = rg.Dataset(
150-
name=f"my_first_dataset",
151-
settings=settings,
152-
client=client,
153-
#workspace="argilla"
154-
)
155-
dataset.create()
156-
```
157-
158-
Now you can **add records to your dataset**. We will use the IMDB dataset from the Hugging Face Datasets library as an example. The `mapping` parameter indicates which keys/columns in the source dataset correspond to the Argilla dataset fields.
159+
To do that, first follow the steps in the previous section to connect to your Argilla server using the SDK. Then, you can load your dataset and export it to the hub like this:
159160

160161
```python
161-
from datasets import load_dataset
162-
163-
data = load_dataset("imdb", split="train[:100]").to_list()
162+
dataset = client.datasets(name="my_dataset")
164163

165-
dataset.records.log(records=data, mapping={"text": "review"})
164+
dataset.to_hub(repo_id="<my_org>/<my_dataset>")
166165
```
167166

168-
🎉 You have successfully created your first dataset with Argilla. You can now access it in the Argilla UI and start annotating the records.
167+
For more info on exporting datasets to the Hub, read our guide on [exporting datasets](../how_to_guides/import_export.md#export-to-hub).
169168

170169
## Next steps
171170

172-
- To learn how to create your datasets, workspace, and manage users, check the [how-to guides](../how_to_guides/index.md).
171+
- To learn how to create your own datasets, workspaces, and manage users, check the [how-to guides](../how_to_guides/index.md).
173172

174173
- To learn Argilla with hands-on examples, check the [Tutorials section](../tutorials/index.md).
175174

176-
- To further configure your Argilla Space, check the [Hugging Face Spaces settings guide](how-to-configure-argilla-on-huggingface.md).
175+
- To further configure your Argilla Space, check the [Hugging Face Spaces settings guide](how-to-configure-argilla-on-huggingface.md).

0 commit comments

Comments
 (0)