Merge branch 'main' into develop

frascuchon · frascuchon · commit ca49158b1c25 · 2024-05-29T10:49:23.000+02:00
diff --git a/docs/_source/_common/tabs/question_settings.md b/docs/_source/_common/tabs/question_settings.md
@@ -24,7 +24,8 @@ rg.MultiLabelQuestion(
     description="Select all that apply",
     labels={"hate": "Hate Speech" , "sexual": "Sexual content", "violent": "Violent content", "pii": "Personal information", "untruthful": "Untruthful info", "not_english": "Not English", "inappropriate": "Inappropriate content"}, # or ["hate", "sexual", "violent", "pii", "untruthful", "not_english", "inappropriate"]
     required=False,
-    visible_labels=4
+    visible_labels=4,
+    labels_order="natural"
 )
 ```
 
diff --git a/docs/_source/_static/tutorials/llama_index/argilla-ui-dataset.png b/docs/_source/_static/tutorials/llama_index/argilla-ui-dataset.png
diff --git a/docs/_source/getting_started/installation/configurations/server_configuration.md b/docs/_source/getting_started/installation/configurations/server_configuration.md
@@ -36,11 +36,13 @@ NGINX and Traefik have been tested and are known to work with Argilla:
 Since the Argilla Server is built on FastAPI, you can launch it using `uvicorn`:
 
 ```bash
-uvicorn argilla:app
+uvicorn argilla_server:app --port 6900
 ```
 
 :::{note}
 For more details about FastAPI and uvicorn, see [here](https://fastapi.tiangolo.com/deployment/manually/#run-a-server-manually-uvicorn).
+
+You can also visit the uvicorn official documentation [here](https://www.uvicorn.org/#usage).
 :::
 
 ## Environment variables
diff --git a/docs/_source/getting_started/installation/configurations/user_management.md b/docs/_source/getting_started/installation/configurations/user_management.md
@@ -182,7 +182,7 @@ import argilla as rg
 
 rg.init(api_url="<ARGILLA_API_URL>", api_key="<ARGILLA_API_KEY>")
 
-user = rg.User.from_id("new-user")
+user = rg.User.from_id("00000000-0000-0000-0000-000000000000")
 ```
 
 ### Assign a `User` to a `Workspace`
diff --git a/docs/_source/getting_started/installation/configurations/workspace_management.md b/docs/_source/getting_started/installation/configurations/workspace_management.md
@@ -163,8 +163,8 @@ workspace = rg.Workspace.from_name("new-workspace")
 users = workspace.users
 for user in users:
    ...
-workspace.add_user("<USER_ID>")
-workspace.delete_user("<USER_ID>")
+workspace.add_user(user.id)
+workspace.delete_user(user.id)
 ```
 :::
 
diff --git a/docs/_source/getting_started/installation/deployments/huggingface-spaces.md b/docs/_source/getting_started/installation/deployments/huggingface-spaces.md
@@ -55,83 +55,67 @@ Once Argilla is running, you can use the UI with the Direct URL. This URL gives
 
 ### Create your first dataset
 
-If everything goes well, you are ready to use the Argilla Python client from an IDE such as Colab, Jupyter, or VS Code.
-
-If you want a quick step-by-step example, keep reading. If you want an end-to-end tutorial, go to this [tutorial and use Colab or Jupyter](https://docs.argilla.io/en/latest/tutorials/notebooks/training-textclassification-setfit-fewshot.html).
-
-First, we need to pip install `datasets` and `argilla` on Colab or your local machine:
+To create your first dataset, you need to pip install `argilla` on Colab or your local machine:
 
 ```bash
-pip install datasets argilla
-```
-
-Then, you can read the example dataset using the `datasets` library. This dataset is a CSV file uploaded to the Hub using the drag-and-drop feature.
-
-```python
-from datasets import load_dataset
-
-dataset = load_dataset("dvilasuero/banking_app", split="train").shuffle()
+pip install argilla
 ```
 
-You can create your first dataset by logging it into Argilla using your endpoint URL:
+Then, you have to connect to your Argilla HF Space. Get the `api_url` as mentioned before and copy the `api_key` from "My settings" (UI):
 
 ```python
 import argilla as rg
 
-# if you connect to your public app endpoint (uses default API key)
-rg.init(api_url="[your_space_url]", api_key="admin.apikey")
-
-# if you connect to your private app endpoint (uses default API key)
-rg.init(api_url="[your_space_url]", api_key="admin.apikey", extra_headers={"Authorization": f"Bearer {os.environ['HF_TOKEN']}"})
-
-# transform dataset into Argilla's format and log it
-rg.log(rg.read_datasets(dataset, task="TextClassification"), name="bankingapp_sentiment")
+# If you connect to your public HF Space
+rg.init(
+  api_url="[your_space_url]",
+  api_key="admin.apikey" # this is the default API key, don't change it if you didn't set up one during the Space creation
+  )
+
+# If you connect to your private HF Space
+rg.init(
+  api_url="[your_space_url]",
+  api_key="admin.apikey", # this is the default API key, don't change it if you didn't set up one during the Space creation
+  extra_headers={"Authorization": f"Bearer {os.environ['HF_TOKEN']}"}
+  )
 ```
-
-Congrats! You now have a dataset available from the Argilla UI to start browsing and labeling. In the code above, we've used one of the many integrations with Hugging Face libraries, which let you read hundreds of datasets available on the Hub.
-
-### Data labeling and model training
-
-At this point, you can label your data directly using your Argilla Space and read the training data to train your model of choice.
+Now, create a dataset for text classification. We'll use a task template, check the [docs](../../../practical_guides/create_update_dataset/create_dataset.md) to create a custom dataset. Indicate the workspace where the dataset will be created. You can check them in "My settings" (UI).
 
 ```python
-# this will read our current dataset and turn it into a clean dataset for training
-dataset = rg.load("bankingapp_sentiment").prepare_for_training()
+dataset = rg.FeedbackDataset.for_text_classification(
+    labels=["sadness", "joy"],
+    multi_label=False,
+    use_markdown=True,
+    guidelines=None,
+    metadata_properties=None,
+    vectors_settings=None,
+)
+# Create the dataset to be visualized in the UI (uses default workspace)
+dataset.push_to_argilla(name="my-first-dataset", workspace="admin")
 ```
-
-You can also get the full dataset and push it to the Hub for reproducibility and versioning:
+To add the records, create a list with the records you want to add. Match the fields with the ones specified before. You can also use pandas or `load_dataset` to read an existing dataset and create records from it.
 
 ```python
-# save full argilla dataset for reproducibility
-rg.load("bankingapp_sentiment").to_datasets().push_to_hub("bankingapp_sentiment")
+records = [
+    rg.FeedbackRecord(
+        fields={
+            "text": "I am so happy today",
+        },
+    ),
+    rg.FeedbackRecord(
+        fields={
+            "text": "I feel sad today",
+        },
+    )
+]
+dataset.add_records(records)
 ```
 
-Finally, this is how you can train a SetFit model using data from your Argilla Space:
+Congrats! You now have a dataset available from the Argilla UI to start browsing and labeling. Once annotated, you can also easily push it back to the Hub.
 
 ```python
-from sentence_transformers.losses import CosineSimilarityLoss
-
-from setfit import SetFitModel, SetFitTrainer
-
-# Create train test split
-dataset = dataset.train_test_split()
-
-# Load SetFit model from Hub
-model = SetFitModel.from_pretrained("sentence-transformers/paraphrase-mpnet-base-v2")
-
-# Create trainer
-trainer = SetFitTrainer(
-    model=model,
-    train_dataset=dataset["train"],
-    eval_dataset=dataset["test"],
-    loss_class=CosineSimilarityLoss,
-    batch_size=8,
-    num_iterations=20,
-)
-
-# Train and evaluate
-trainer.train()
-metrics = trainer.evaluate()
+dataset = rg.FeedbackDataset.from_argilla("my-first-dataset", workspace="admin")
+dataset.push_to_huggingface("my-repo/my-first-dataset")
 ```
 
 As a next step, you can check the [Argilla Tutorials](https://docs.argilla.io/en/latest/tutorials/tutorials.html) section. All the tutorials can be run using Colab or local Jupyter Notebooks, so you can start building datasets with Argilla and Spaces!
diff --git a/docs/_source/index.rst b/docs/_source/index.rst
@@ -66,5 +66,4 @@
    Github <https://github.com/argilla-io/argilla>
    community/developer_docs
    community/contributing
-   community/migration-rubrix.md
-
+   community/migration-rubrix.md
diff --git a/docs/_source/practical_guides/create_update_dataset/create_dataset.md b/docs/_source/practical_guides/create_update_dataset/create_dataset.md
@@ -95,6 +95,7 @@ The following arguments apply to specific question types:
 - `field`: A `SpanQuestion` is always attached to a specific field. Here you should pass a string with the name of the field where the labels of the `SpanQuestion` should be used.
 - `allow_overlapping`: In a `SpanQuestion`, this value specifies whether overlapped spans are allowed or not. It is set to `False` by default. Set to `True` to allow overlapping spans.
 - `visible_labels` (optional): In `LabelQuestion`, `MultiLabelQuestion` and `SpanQuestion` this is the number of labels that will be visible at first sight in the UI. By default, the UI will show 20 labels and collapse the rest. Set your preferred number to change this limit or set `visible_labels=None` to show all options.
+- `labels_order` (optional): In `MultiLabelQuestion`, this determines the order in which labels are displayed in the UI. Set it to `natural` to show labels in the order they were defined, or `suggestion` to prioritize labels associated with suggestions. If scores are available, labels will be ordered by descending score. Defaults to `natural`.
 - `use_markdown` (optional): In `TextQuestion` define whether the field should render markdown text. Defaults to `False`. If you set it to `True`, you will be able to use all the Markdown features for text formatting, as well as embed multimedia content and PDFs. To delve further into the details, please refer to this [tutorial](/tutorials_and_integrations/tutorials/feedback/making-most-of-markdown.ipynb).
 
 ```{note}
diff --git a/docs/_source/tutorials_and_integrations/integrations/integrations.md b/docs/_source/tutorials_and_integrations/integrations/integrations.md
@@ -30,6 +30,11 @@ Add text descriptives to your metadata to simplify the data annotation and filte
 
 Add semantic representations to your records using vector embeddings to simplify the data annotation and search process.
 ```
+```{grid-item-card} llama-index: Build LLM applications with LlamaIndex.
+:link: llama_index.html
+
+Build LLM applications with LlamaIndex and automatically log and monitor the predictions with Argilla.
+```
 ````
 
 ```{toctree}
@@ -40,4 +45,5 @@ process_documents_with_unstructured
 monitor_endpoints with_fastapi
 add_text_descriptives_as_metadata
 add_sentence_transformers_embeddings_as_vectors
+llama_index
 ```
diff --git a/docs/_source/tutorials_and_integrations/integrations/llama_index.ipynb b/docs/_source/tutorials_and_integrations/integrations/llama_index.ipynb
diff --git a/docs/_source/tutorials_and_integrations/tutorials/feedback/end2end_examples/add-vectors-004.ipynb b/docs/_source/tutorials_and_integrations/tutorials/feedback/end2end_examples/add-vectors-004.ipynb

Original file line number	Diff line number	Diff line change
`@@ -24,7 +24,8 @@ rg.MultiLabelQuestion(`
`24`	`24`	`description="Select all that apply",`
`25`	`25`	`labels={"hate": "Hate Speech" , "sexual": "Sexual content", "violent": "Violent content", "pii": "Personal information", "untruthful": "Untruthful info", "not_english": "Not English", "inappropriate": "Inappropriate content"}, # or ["hate", "sexual", "violent", "pii", "untruthful", "not_english", "inappropriate"]`
`26`	`26`	`required=False,`
`27`		`- visible_labels=4`
	`27`	`+ visible_labels=4,`
	`28`	`+ labels_order="natural"`
`28`	`29`	`)`
`29`	`30`	```
`30`	`31`