Skip to content

Commit ed1a77f

Browse files
dvsrepojulien-c
andauthored
Adds Argilla Docker Space docs (#585)
* docs: add argilla docker spaces docs * docs: small review * docs: overall review * docs: more fixes * docs: add training code snippet * add community support link * Also link from sidebar * Add link to tuto from /examples Co-authored-by: Julien Chaumond <[email protected]>
1 parent 8d740cc commit ed1a77f

File tree

3 files changed

+134
-0
lines changed

3 files changed

+134
-0
lines changed

docs/hub/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,8 @@
138138
title: Your first Docker Spaces
139139
- local: spaces-sdks-docker-examples
140140
title: Example Docker Spaces
141+
- local: spaces-sdks-docker-argilla
142+
title: Argilla on Spaces
141143
- local: spaces-embed
142144
title: Embed your Space
143145
- local: spaces-config-reference
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# Argilla Docker Spaces
2+
3+
**Argilla** is an open-source, data labelling and curation tool, for highly efficient human-in-the-loop and MLOps workflows. Argilla is composed of (1) a webapp for data exploration, labelling, and curation, and (2) a Python library for building data annotation and monitoring workflows in Python. Argilla nicely integrates with the Hugging Face stack (`datasets`, `transformers`, `hub`, and `setfit`), and now it can also be deployed using the Hub's Docker Spaces.
4+
5+
Visit the [Argilla documentation](https://docs.argilla.io) to learn about its features and check out the [Deep Dive Guides](https://docs.argilla.io/en/latest/guides/guides.html) and [Tutorials](https://docs.argilla.io/en/latest/tutorials/tutorials.html).
6+
7+
If you want to play with the Argilla UI without installing anything, check this [live demo](https://huggingface.co/spaces/argilla/live-demo). The demo is powered by Hugging Face Docker Spaces, the same technology you'll learn in this guide. To login into the UI use: username: `huggingface`, password: `1234`.
8+
9+
In the next sections, you'll learn to deploy your own Argilla app and use it for data labelling workflows right from the Hub.
10+
11+
## Your first Argilla Docker Space
12+
13+
In this section, you'll learn to deploy an Argilla Docker Space and use it for data annotation and training a sentiment classifier with [SetFit](https://github.com/huggingface/setfit/), a few-shot learning library.
14+
15+
You can find the final app at [this example Space](https://huggingface.co/spaces/dvilasuero/argilla-setfit) and the step-by-step tutorial in this [notebook](https://colab.research.google.com/drive/1GeBBuRw8CIZ6SYql5Vdx4Q2Vv74eFa1I?usp=sharing).
16+
17+
### Duplicate the Argilla Docker Template and create your Space
18+
19+
The easiest way to get started is by [duplicating the Argilla Docker Template](https://huggingface.co/spaces/argilla/template-space-docker?duplicate=true). You need to define the **Owner** (your personal account or an organization you are part of), a **Space name**, and the **Visibility**, which we recommend to set up to Public if you want to interact with the Argilla app from the outside. Once you are all set, click "Duplicate Space".
20+
21+
Note: You'll see a mention to the need of setting up environment variables (`API_KEY`) by adding a secret to your Space but will see this in a second.
22+
23+
### Setting up secret environment variables
24+
25+
The Space template provides a way to set up two optional settings:
26+
27+
- `API_KEY`: As mentioned earlier, Argilla provides a Python library to interact with the app (read and write data, log model predictions, etc.). If you don't set this variable, the library and your app will use the default API key. If you want to secure your Space for reading and writing data, we recommend you to set up this variable. The API key you choose can be any string of your choice and you can check an online generator if you like.
28+
29+
- `PASSWORD`: This setting allows you to set up a custom password for login into the app. The default password is `1234` and the default user is `argilla`. By setting up a custom password you can use your own password to login into the app. The value of the `PASSWORD` secret must be a hashed password, you can generate one following [this guide](https://docs.argilla.io/en/latest/getting_started/installation/user_management.html#override-default-password).
30+
31+
In order to set up these secrets, you need to go to the Settings tab on your newly created Space and make sure to store the values somewhere safe on your local machine for later use. For testing purposes, you can completely skip this step, or just set one of the two variables. If you do this, the default values from the [basic Argilla setup](https://docs.argilla.io/en/latest/getting_started/installation/installation.html) will be kept.
32+
33+
### Create your first dataset
34+
35+
Once your Argilla Space is running, you can start interacting with the it using the Direct URL you'll find in the "Embed this Space" option (top right). Let's say it's https://dvilasuero-argilla-setfit.hf.space. This URL will give you access to a full-screen, stable Argilla app, but will also serve as an endpoint for interacting with Argilla Python library. Let's see how to create our first dataset for labelling. You also can access the app directly using the main URL of the Space, for example: https://huggingface.co/spaces/dvilasuero/argilla-setfit.
36+
37+
First we need to pip install `datasets` and `argilla` on Colab or your local machine:
38+
39+
```bash
40+
pip install datasets argilla
41+
```
42+
43+
Then, you can read the example dataset using the `datasets` library (this dataset is just a CSV file uploaded to the Hub using the drag and drop feature).
44+
45+
```python
46+
from datasets import load_dataset
47+
48+
dataset = load_dataset("dvilasuero/banking_app", split="train").shuffle()
49+
```
50+
51+
Now you can create your first dataset by logging it into Argilla using your endpoint URL and (optionally) `API_KEY`:
52+
53+
```python
54+
import argilla as rg
55+
56+
# connect to your app endpoint
57+
rg.init(api_url="https://dvilasuero-argilla-setfit.hf.space", api_key="YOUR_SECRET_API_KEY")
58+
59+
# transform dataset into Argilla's format and log it
60+
rg.log(rg.read_datasets(dataset, task="TextClassification"), name="bankingapp_sentiment")
61+
```
62+
63+
If everything went well, you now have a dataset available from the Argilla UI to start browsing and labelling. In the code above, we've used one of the many integrations with Hugging Face libraries, which let you [read hundreds of datasets](https://docs.argilla.io/en/latest/guides/features/datasets.html#Importing-a-Dataset) available on the Hub.
64+
65+
### Data labelling and model training
66+
67+
At this point, you can label your data directly using your Argilla Space and read the training data to train your model of choice. In this [Colab notebook](https://colab.research.google.com/drive/1GeBBuRw8CIZ6SYql5Vdx4Q2Vv74eFa1I?usp=sharing), you can follow the full step-by-step tutorial, but let's see how we can retrieve data from our interactive data annotation session, and the code need to train a SetFit model.
68+
69+
```python
70+
# this will read our current dataset and turn it into a clean dataset for training
71+
dataset = rg.load("bankingapp_sentiment").prepare_for_training()
72+
```
73+
74+
You can also get the full dataset and push it to the Hub for reproducibility and versioning:
75+
76+
```python
77+
# save full argilla dataset for reproducibility
78+
rg.load("bankingapp_sentiment").to_datasets().push_to_hub("bankingapp_sentiment")
79+
```
80+
81+
Finally, this is how you can train a SetFit model using data from your Argilla Space:
82+
83+
```python
84+
from sentence_transformers.losses import CosineSimilarityLoss
85+
86+
from setfit import SetFitModel, SetFitTrainer
87+
88+
# Create train test split
89+
dataset = dataset.train_test_split()
90+
91+
# Load SetFit model from Hub
92+
model = SetFitModel.from_pretrained("sentence-transformers/paraphrase-mpnet-base-v2")
93+
94+
# Create trainer
95+
trainer = SetFitTrainer(
96+
model=model,
97+
train_dataset=dataset["train"],
98+
eval_dataset=dataset["test"],
99+
loss_class=CosineSimilarityLoss,
100+
batch_size=8,
101+
num_iterations=20,
102+
)
103+
104+
# Train and evaluate
105+
trainer.train()
106+
metrics = trainer.evaluate()
107+
```
108+
109+
## Advanced settings
110+
111+
This section provides details about what's done under the hood for setting up the Template Docker Space. You can use this guide to set up your own Docker Space.
112+
113+
### Dockerfile
114+
115+
The [`Dockerfile`](https://huggingface.co/spaces/argilla/template-space-docker/blob/main/Dockerfile) is based on the Elasticsearch image, which is the database used by Argilla apps. The build process does the following high-level steps:
116+
117+
1. Install Python, pip, and dependencies defined on the `requirements.txt` file.
118+
2. Read secrets from the Space settings and make them available. You can find out more about how to use secrets in Docker spaces on [this guide](https://huggingface.co/docs/hub/spaces-sdks-docker#secret-management).
119+
3. Launch Elasticsearch, setup the `API_KEY` and `PASSWORD` if available, and launch the Argilla service. This done in the `start.sh` script described below.
120+
121+
### start.sh
122+
123+
This script launches Elasticsearch, uses the `waitforit.sh` utility to make sure Elasticsearch is up and running for the Argilla service, sets up the environment variables `API_KEY` and `PASSWORD` if available, and run `python -m argilla` which serves the webapp and API endpoint for reading and writing data.
124+
125+
### Demo Space: Setting up workspaces and users
126+
127+
If you are looking for a more advanced configuration of Argilla, which involves setting up several users and workspaces, can check out the [Argilla Demo Docker Space codebase](https://huggingface.co/spaces/argilla/live-demo/tree/main).
128+
129+
## Feedback and support
130+
131+
If you have improvement suggestions or need specific support, please join [Argilla Slack community](https://join.slack.com/t/rubrixworkspace/shared_invite/zt-whigkyjn-a3IUJLD7gDbTZ0rKlvcJ5g) or reach out on [Argilla's GitHub repository](https://github.com/argilla-io/argilla).

docs/hub/spaces-sdks-docker-examples.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,4 @@ We gathered some example demos in the [Docker Templates](https://huggingface.co/
88
* HTTP endpoint in Go with query parameters https://huggingface.co/spaces/XciD/test-docker-go?q=Adrien
99
* Shiny app written in Python https://huggingface.co/spaces/elonmuskceo/shiny-orbit-simulation
1010
* Genie.jl app in Julia https://huggingface.co/spaces/nooji/GenieOnHuggingFaceSpaces
11+
* Argilla app for data labelling and curation: https://huggingface.co/spaces/argilla/live-demo and [write-up about hosting Argilla on Spaces](./spaces-sdks-docker-argilla) by [@dvilasuero](https://huggingface.co/dvilasuero) 🎉

0 commit comments

Comments
 (0)