You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/_source/getting_started/installation/deployments/huggingface-spaces.md
+43-59Lines changed: 43 additions & 59 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -55,83 +55,67 @@ Once Argilla is running, you can use the UI with the Direct URL. This URL gives
55
55
56
56
### Create your first dataset
57
57
58
-
If everything goes well, you are ready to use the Argilla Python client from an IDE such as Colab, Jupyter, or VS Code.
59
-
60
-
If you want a quick step-by-step example, keep reading. If you want an end-to-end tutorial, go to this [tutorial and use Colab or Jupyter](https://docs.argilla.io/en/latest/tutorials/notebooks/training-textclassification-setfit-fewshot.html).
61
-
62
-
First, we need to pip install `datasets` and `argilla` on Colab or your local machine:
58
+
To create your first dataset, you need to pip install `argilla` on Colab or your local machine:
63
59
64
60
```bash
65
-
pip install datasets argilla
66
-
```
67
-
68
-
Then, you can read the example dataset using the `datasets` library. This dataset is a CSV file uploaded to the Hub using the drag-and-drop feature.
Congrats! You now have a dataset available from the Argilla UI to start browsing and labeling. In the code above, we've used one of the many integrations with Hugging Face libraries, which let you read hundreds of datasets available on the Hub.
92
-
93
-
### Data labeling and model training
94
-
95
-
At this point, you can label your data directly using your Argilla Space and read the training data to train your model of choice.
82
+
Now, create a dataset for text classification. We'll use a task template, check the [docs](../../../practical_guides/create_update_dataset/create_dataset.md) to create a custom dataset. Indicate the workspace where the dataset will be created. You can check them in "My settings" (UI).
96
83
97
84
```python
98
-
# this will read our current dataset and turn it into a clean dataset for training
You can also get the full dataset and push it to the Hub for reproducibility and versioning:
96
+
To add the records, create a list with the records you want to add. Match the fields with the ones specified before. You can also use pandas or `load_dataset` to read an existing dataset and create records from it.
Finally, this is how you can train a SetFit model using data from your Argilla Space:
114
+
Congrats! You now have a dataset available from the Argilla UI to start browsing and labeling. Once annotated, you can also easily push it back to the Hub.
110
115
111
116
```python
112
-
from sentence_transformers.losses import CosineSimilarityLoss
113
-
114
-
from setfit import SetFitModel, SetFitTrainer
115
-
116
-
# Create train test split
117
-
dataset = dataset.train_test_split()
118
-
119
-
# Load SetFit model from Hub
120
-
model = SetFitModel.from_pretrained("sentence-transformers/paraphrase-mpnet-base-v2")
As a next step, you can check the [Argilla Tutorials](https://docs.argilla.io/en/latest/tutorials/tutorials.html) section. All the tutorials can be run using Colab or local Jupyter Notebooks, so you can start building datasets with Argilla and Spaces!
Copy file name to clipboardExpand all lines: docs/_source/practical_guides/create_update_dataset/create_dataset.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -95,6 +95,7 @@ The following arguments apply to specific question types:
95
95
-`field`: A `SpanQuestion` is always attached to a specific field. Here you should pass a string with the name of the field where the labels of the `SpanQuestion` should be used.
96
96
-`allow_overlapping`: In a `SpanQuestion`, this value specifies whether overlapped spans are allowed or not. It is set to `False` by default. Set to `True` to allow overlapping spans.
97
97
-`visible_labels` (optional): In `LabelQuestion`, `MultiLabelQuestion` and `SpanQuestion` this is the number of labels that will be visible at first sight in the UI. By default, the UI will show 20 labels and collapse the rest. Set your preferred number to change this limit or set `visible_labels=None` to show all options.
98
+
-`labels_order` (optional): In `MultiLabelQuestion`, this determines the order in which labels are displayed in the UI. Set it to `natural` to show labels in the order they were defined, or `suggestion` to prioritize labels associated with suggestions. If scores are available, labels will be ordered by descending score. Defaults to `natural`.
98
99
-`use_markdown` (optional): In `TextQuestion` define whether the field should render markdown text. Defaults to `False`. If you set it to `True`, you will be able to use all the Markdown features for text formatting, as well as embed multimedia content and PDFs. To delve further into the details, please refer to this [tutorial](/tutorials_and_integrations/tutorials/feedback/making-most-of-markdown.ipynb).
0 commit comments