Skip to content

Commit ca49158

Browse files
committed
Merge branch 'main' into develop
2 parents 41c08c0 + 730b0d2 commit ca49158

File tree

11 files changed

+276
-71
lines changed

11 files changed

+276
-71
lines changed

docs/_source/_common/tabs/question_settings.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,8 @@ rg.MultiLabelQuestion(
2424
description="Select all that apply",
2525
labels={"hate": "Hate Speech" , "sexual": "Sexual content", "violent": "Violent content", "pii": "Personal information", "untruthful": "Untruthful info", "not_english": "Not English", "inappropriate": "Inappropriate content"}, # or ["hate", "sexual", "violent", "pii", "untruthful", "not_english", "inappropriate"]
2626
required=False,
27-
visible_labels=4
27+
visible_labels=4,
28+
labels_order="natural"
2829
)
2930
```
3031

100 KB
Loading

docs/_source/getting_started/installation/configurations/server_configuration.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,11 +36,13 @@ NGINX and Traefik have been tested and are known to work with Argilla:
3636
Since the Argilla Server is built on FastAPI, you can launch it using `uvicorn`:
3737

3838
```bash
39-
uvicorn argilla:app
39+
uvicorn argilla_server:app --port 6900
4040
```
4141

4242
:::{note}
4343
For more details about FastAPI and uvicorn, see [here](https://fastapi.tiangolo.com/deployment/manually/#run-a-server-manually-uvicorn).
44+
45+
You can also visit the uvicorn official documentation [here](https://www.uvicorn.org/#usage).
4446
:::
4547

4648
## Environment variables

docs/_source/getting_started/installation/configurations/user_management.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ import argilla as rg
182182

183183
rg.init(api_url="<ARGILLA_API_URL>", api_key="<ARGILLA_API_KEY>")
184184

185-
user = rg.User.from_id("new-user")
185+
user = rg.User.from_id("00000000-0000-0000-0000-000000000000")
186186
```
187187

188188
### Assign a `User` to a `Workspace`

docs/_source/getting_started/installation/configurations/workspace_management.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -163,8 +163,8 @@ workspace = rg.Workspace.from_name("new-workspace")
163163
users = workspace.users
164164
for user in users:
165165
...
166-
workspace.add_user("<USER_ID>")
167-
workspace.delete_user("<USER_ID>")
166+
workspace.add_user(user.id)
167+
workspace.delete_user(user.id)
168168
```
169169
:::
170170

docs/_source/getting_started/installation/deployments/huggingface-spaces.md

Lines changed: 43 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -55,83 +55,67 @@ Once Argilla is running, you can use the UI with the Direct URL. This URL gives
5555

5656
### Create your first dataset
5757

58-
If everything goes well, you are ready to use the Argilla Python client from an IDE such as Colab, Jupyter, or VS Code.
59-
60-
If you want a quick step-by-step example, keep reading. If you want an end-to-end tutorial, go to this [tutorial and use Colab or Jupyter](https://docs.argilla.io/en/latest/tutorials/notebooks/training-textclassification-setfit-fewshot.html).
61-
62-
First, we need to pip install `datasets` and `argilla` on Colab or your local machine:
58+
To create your first dataset, you need to pip install `argilla` on Colab or your local machine:
6359

6460
```bash
65-
pip install datasets argilla
66-
```
67-
68-
Then, you can read the example dataset using the `datasets` library. This dataset is a CSV file uploaded to the Hub using the drag-and-drop feature.
69-
70-
```python
71-
from datasets import load_dataset
72-
73-
dataset = load_dataset("dvilasuero/banking_app", split="train").shuffle()
61+
pip install argilla
7462
```
7563

76-
You can create your first dataset by logging it into Argilla using your endpoint URL:
64+
Then, you have to connect to your Argilla HF Space. Get the `api_url` as mentioned before and copy the `api_key` from "My settings" (UI):
7765

7866
```python
7967
import argilla as rg
8068

81-
# if you connect to your public app endpoint (uses default API key)
82-
rg.init(api_url="[your_space_url]", api_key="admin.apikey")
83-
84-
# if you connect to your private app endpoint (uses default API key)
85-
rg.init(api_url="[your_space_url]", api_key="admin.apikey", extra_headers={"Authorization": f"Bearer {os.environ['HF_TOKEN']}"})
86-
87-
# transform dataset into Argilla's format and log it
88-
rg.log(rg.read_datasets(dataset, task="TextClassification"), name="bankingapp_sentiment")
69+
# If you connect to your public HF Space
70+
rg.init(
71+
api_url="[your_space_url]",
72+
api_key="admin.apikey" # this is the default API key, don't change it if you didn't set up one during the Space creation
73+
)
74+
75+
# If you connect to your private HF Space
76+
rg.init(
77+
api_url="[your_space_url]",
78+
api_key="admin.apikey", # this is the default API key, don't change it if you didn't set up one during the Space creation
79+
extra_headers={"Authorization": f"Bearer {os.environ['HF_TOKEN']}"}
80+
)
8981
```
90-
91-
Congrats! You now have a dataset available from the Argilla UI to start browsing and labeling. In the code above, we've used one of the many integrations with Hugging Face libraries, which let you read hundreds of datasets available on the Hub.
92-
93-
### Data labeling and model training
94-
95-
At this point, you can label your data directly using your Argilla Space and read the training data to train your model of choice.
82+
Now, create a dataset for text classification. We'll use a task template, check the [docs](../../../practical_guides/create_update_dataset/create_dataset.md) to create a custom dataset. Indicate the workspace where the dataset will be created. You can check them in "My settings" (UI).
9683

9784
```python
98-
# this will read our current dataset and turn it into a clean dataset for training
99-
dataset = rg.load("bankingapp_sentiment").prepare_for_training()
85+
dataset = rg.FeedbackDataset.for_text_classification(
86+
labels=["sadness", "joy"],
87+
multi_label=False,
88+
use_markdown=True,
89+
guidelines=None,
90+
metadata_properties=None,
91+
vectors_settings=None,
92+
)
93+
# Create the dataset to be visualized in the UI (uses default workspace)
94+
dataset.push_to_argilla(name="my-first-dataset", workspace="admin")
10095
```
101-
102-
You can also get the full dataset and push it to the Hub for reproducibility and versioning:
96+
To add the records, create a list with the records you want to add. Match the fields with the ones specified before. You can also use pandas or `load_dataset` to read an existing dataset and create records from it.
10397

10498
```python
105-
# save full argilla dataset for reproducibility
106-
rg.load("bankingapp_sentiment").to_datasets().push_to_hub("bankingapp_sentiment")
99+
records = [
100+
rg.FeedbackRecord(
101+
fields={
102+
"text": "I am so happy today",
103+
},
104+
),
105+
rg.FeedbackRecord(
106+
fields={
107+
"text": "I feel sad today",
108+
},
109+
)
110+
]
111+
dataset.add_records(records)
107112
```
108113

109-
Finally, this is how you can train a SetFit model using data from your Argilla Space:
114+
Congrats! You now have a dataset available from the Argilla UI to start browsing and labeling. Once annotated, you can also easily push it back to the Hub.
110115

111116
```python
112-
from sentence_transformers.losses import CosineSimilarityLoss
113-
114-
from setfit import SetFitModel, SetFitTrainer
115-
116-
# Create train test split
117-
dataset = dataset.train_test_split()
118-
119-
# Load SetFit model from Hub
120-
model = SetFitModel.from_pretrained("sentence-transformers/paraphrase-mpnet-base-v2")
121-
122-
# Create trainer
123-
trainer = SetFitTrainer(
124-
model=model,
125-
train_dataset=dataset["train"],
126-
eval_dataset=dataset["test"],
127-
loss_class=CosineSimilarityLoss,
128-
batch_size=8,
129-
num_iterations=20,
130-
)
131-
132-
# Train and evaluate
133-
trainer.train()
134-
metrics = trainer.evaluate()
117+
dataset = rg.FeedbackDataset.from_argilla("my-first-dataset", workspace="admin")
118+
dataset.push_to_huggingface("my-repo/my-first-dataset")
135119
```
136120

137121
As a next step, you can check the [Argilla Tutorials](https://docs.argilla.io/en/latest/tutorials/tutorials.html) section. All the tutorials can be run using Colab or local Jupyter Notebooks, so you can start building datasets with Argilla and Spaces!

docs/_source/index.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,5 +66,4 @@
6666
Github <https://github.com/argilla-io/argilla>
6767
community/developer_docs
6868
community/contributing
69-
community/migration-rubrix.md
70-
69+
community/migration-rubrix.md

docs/_source/practical_guides/create_update_dataset/create_dataset.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ The following arguments apply to specific question types:
9595
- `field`: A `SpanQuestion` is always attached to a specific field. Here you should pass a string with the name of the field where the labels of the `SpanQuestion` should be used.
9696
- `allow_overlapping`: In a `SpanQuestion`, this value specifies whether overlapped spans are allowed or not. It is set to `False` by default. Set to `True` to allow overlapping spans.
9797
- `visible_labels` (optional): In `LabelQuestion`, `MultiLabelQuestion` and `SpanQuestion` this is the number of labels that will be visible at first sight in the UI. By default, the UI will show 20 labels and collapse the rest. Set your preferred number to change this limit or set `visible_labels=None` to show all options.
98+
- `labels_order` (optional): In `MultiLabelQuestion`, this determines the order in which labels are displayed in the UI. Set it to `natural` to show labels in the order they were defined, or `suggestion` to prioritize labels associated with suggestions. If scores are available, labels will be ordered by descending score. Defaults to `natural`.
9899
- `use_markdown` (optional): In `TextQuestion` define whether the field should render markdown text. Defaults to `False`. If you set it to `True`, you will be able to use all the Markdown features for text formatting, as well as embed multimedia content and PDFs. To delve further into the details, please refer to this [tutorial](/tutorials_and_integrations/tutorials/feedback/making-most-of-markdown.ipynb).
99100

100101
```{note}

docs/_source/tutorials_and_integrations/integrations/integrations.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,11 @@ Add text descriptives to your metadata to simplify the data annotation and filte
3030
3131
Add semantic representations to your records using vector embeddings to simplify the data annotation and search process.
3232
```
33+
```{grid-item-card} llama-index: Build LLM applications with LlamaIndex.
34+
:link: llama_index.html
35+
36+
Build LLM applications with LlamaIndex and automatically log and monitor the predictions with Argilla.
37+
```
3338
````
3439

3540
```{toctree}
@@ -40,4 +45,5 @@ process_documents_with_unstructured
4045
monitor_endpoints with_fastapi
4146
add_text_descriptives_as_metadata
4247
add_sentence_transformers_embeddings_as_vectors
48+
llama_index
4349
```

0 commit comments

Comments
 (0)