You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Description
<!-- Please include a summary of the changes and the related issue.
Please also include relevant motivation and context. List any
dependencies that are required for this change. -->
Closes #<issue_number>
**Type of change**
<!-- Please delete options that are not relevant. Remember to title the
PR according to the type of change -->
- Bug fix (non-breaking change which fixes an issue)
- New feature (non-breaking change which adds functionality)
- Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- Refactor (change restructuring the codebase without changing
functionality)
- Improvement (change adding some improvement to an existing
functionality)
- Documentation update
**How Has This Been Tested**
<!-- Please add some reference about how your feature has been tested.
-->
**Checklist**
<!-- Please go over the list and make sure you've taken everything into
account -->
- I added relevant documentation
- I followed the style guidelines of this project
- I did a self-review of my code
- I made corresponding changes to the documentation
- I confirm My changes generate no new warnings
- I have added tests that prove my fix is effective or that my feature
works
- I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)
---------
Co-authored-by: Sara Han <[email protected]>
Copy file name to clipboardExpand all lines: docs/_source/conceptual_guides/data_model.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -133,7 +133,7 @@ record = rg.TextClassificationRecord(
133
133
134
134
##### Token classification
135
135
136
-
Tasks of the kind of token classification are NLP tasks aimed at dividing the input text into words, or syllables, and assigning certain values to them. Think about giving each word in a sentence its grammatical category or highlight which parts of a medical report belong to a certain specialty. There are some popular ones like NER or POS-tagging.
136
+
Tasks of the kind of token classification are NLP tasks aimed at dividing the input text into words, or syllables, and assigning certain values to them. Think about giving each word in a sentence its grammatical category or highlight which parts of a medical report belong to a certain speciality. There are some popular ones like NER or POS-tagging.
137
137
138
138
```python
139
139
record = rg.TokenClassificationRecord(
@@ -190,4 +190,4 @@ You can see our supported tasks at {ref}`tasks`.
190
190
191
191
### Settings
192
192
193
-
For now, only a set of predefined labels (labels schema) is configurable. Still, other settings like annotators, and metadata schema, are planned to be supported as part of dataset settings.
193
+
For now, only a set of predefined labels (labels schema) is configurable. Still, other settings like annotators, and metadata schema, are planned to be supported as part of dataset settings.
Copy file name to clipboardExpand all lines: docs/_source/getting_started/argilla.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -138,7 +138,7 @@ Finally, platforms like Snorkel, Prodigy or Scale, while more comprehensive, oft
138
138
<summary>What is Argilla currently working on?</summary>
139
139
<p>
140
140
141
-
We are continuously working on improving Argilla's features and usability, focusing now concentrating on a three-pronged vision: the development of Argilla Core (open-source), Distilabel, and Argilla JS/TS. You can find a list of our current projects <ahref="https://github.com/orgs/argilla-io/projects/10/views/1">here</a>.
141
+
We are continuously working on improving Argilla's features and usability, focusing now on a three-pronged vision: the development of Argilla Core (open-source), Distilabel, and Argilla JS/TS. You can find a list of our current projects <ahref="https://github.com/orgs/argilla-io/projects/10/views/1">here</a>.
Copy file name to clipboardExpand all lines: docs/_source/getting_started/installation/deployments/cloud_providers.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -157,7 +157,7 @@ gcloud auth login
157
157
158
158
### 2. Build and deploy the container
159
159
160
-
We will use the `gcloud run deploy` command to deploy the Argilla container directly from the Docker Hub. We can point the cloud run url to the container's default port (6900) and define relevant compute resouces.
160
+
We will use the `gcloud run deploy` command to deploy the Argilla container directly from the Docker Hub. We can point the cloud run url to the container's default port (6900) and define relevant compute resources.
Copy file name to clipboardExpand all lines: docs/_source/practical_guides/annotate_dataset.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -90,7 +90,7 @@ You can track your progress and the number of `Pending`, `Draft`, `Submitted` an
90
90
91
91
In Argilla's Feedback Task datasets, you can annotate and process records in two ways:
92
92
93
-
-**Focus view**: you can only see, respond and perfom actions on one record at a time. This is better for records that need to be examined closely and individually before responding.
93
+
-**Focus view**: you can only see, respond and perform actions on one record at a time. This is better for records that need to be examined closely and individually before responding.
94
94
-**Bulk view**: you can see multiple records in a list so you can respond and perform actions on more than one record at a time. This is useful for actions that can be taken on many records that have similar characteristics e.g., apply the same label to the results of a similarity search, discard all records in a specific language or save/submit records with a suggestion score over a safe threshold.
95
95
96
96
```{hint}
@@ -105,7 +105,7 @@ If you have a Span question in your dataset, you can always answer other questio
105
105
106
106
In the queue of **Pending** records, you can change from _Focus_ to _Bulk_ view. Once in the _Bulk view_, you can expand or collapse records --i.e. see the full length of all records in the page or set a fixed height-- and select the number of records you want to see per page.
107
107
108
-
To select or unselect all records in the page, click on the checkbox above the record list. To select or unselect specific records, click on the checkbox inside the individual record card. When you use filters inside the bulk view and the results are higher than the records visible in the page but lower than 1000, you will also have the option to select all of the results after you click on the checkbox. You can cancel this selection clicking on the _Cancel_ button.
108
+
To select or unselect all records in the page, click on the checkbox above the record list. To select or unselect specific records, click on the checkbox inside the individual record card. When you use filters inside the bulk view and the results are higher than the records visible in the page but lower than 1000, you will also have the option to select all of the results after you click on the checkbox. You can cancel this selection by clicking on the _Cancel_ button.
109
109
110
110
Once records are selected, choose the responses that apply to all selected records (if any) and do the desired action: _Discard_, _Save as draft_ or even _Submit_. Note that you can only submit the records if all required questions have been answered.
111
111
@@ -169,7 +169,7 @@ Not all filters listed below are available for all tasks.
169
169
170
170
##### Predictions filter
171
171
172
-
This filter allows you to filter records with respect of their predictions:
172
+
This filter allows you to filter records with respect to their predictions:
173
173
174
174
-**Predicted as**: filter records by their predicted labels.
175
175
-**Predicted ok**: filter records whose predictions do, or do not, match the annotations.
@@ -291,4 +291,4 @@ If you struggle to increase the overall coverage, try to filter for the records
291
291
#### Manage rules
292
292
293
293
Here you will see a list of your saved rules.
294
-
You can edit a rule by clicking on its name, or delete it by clicking on the trash icon.
294
+
You can edit a rule by clicking on its name, or delete it by clicking on the trash icon.
Copy file name to clipboardExpand all lines: docs/_source/practical_guides/collect_responses.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -183,7 +183,7 @@ We plan on adding more support for other metrics so feel free to reach out on ou
183
183
184
184
#### Model Metrics
185
185
186
-
In contrast to agreement metrics, where we compare the responses of annotators with each other, it is a good practice to evaluate the suggestions of models against the annotators as ground truths. As `FeedbackDataset` already offers the possibility to add `suggestions` to the responses, we can compare these initial predictions against the verified reponses. This will give us two important insights: how reliable the responses of a given annotator are, and how good the suggestions we are giving to the annotators are. This way, we can take action to improve the quality of the responses by making changes to the guidelines or the structure, and the suggestions given to the annotators by changing or updating the model we use. Note that each question type has a different set of metrics available.
186
+
In contrast to agreement metrics, where we compare the responses of annotators with each other, it is a good practice to evaluate the suggestions of models against the annotators as ground truths. As `FeedbackDataset` already offers the possibility to add `suggestions` to the responses, we can compare these initial predictions against the verified responses. This will give us two important insights: how reliable the responses of a given annotator are, and how good the suggestions we are giving to the annotators are. This way, we can take action to improve the quality of the responses by making changes to the guidelines or the structure, and the suggestions given to the annotators by changing or updating the model we use. Note that each question type has a different set of metrics available.
187
187
188
188
Here is an example use of the `compute` function to calculate the metrics for a `FeedbackDataset`:
189
189
@@ -495,4 +495,4 @@ f1(name="sst2").visualize()
495
495
# now compute metrics for negation ( -> negative precision and positive recall go down)
496
496
f1(name="sst2", query="n't OR not").visualize()
497
497
```
498
-

498
+

local_dataset = remote_dataset.pull(max_records=100) # get first 100 records
21
21
```
22
22
23
-
If your dataset includes vectors, by default these will **not** get pulled with the rest of the dataset in order to improve performace. If you would like to pull the vectors in your records, you will need to specify it like so:
23
+
If your dataset includes vectors, by default these will **not** get pulled with the rest of the dataset in order to improve performance. If you would like to pull the vectors in your records, you will need to specify it like so:
24
24
25
25
::::{tab-set}
26
26
@@ -204,4 +204,4 @@ df = dataset_rg.to_pandas()
204
204
df.to_csv("my_dataset.csv") # Save as CSV
205
205
df.to_json("my_dataset.json") # Save as JSON
206
206
df.to_parquet("my_dataset.parquet") # Save as Parquet
For datasets that where annotated with numerical values we could also pass the label strategy we want to use (let's assume we have another question in the dataset named "other-question" that contains values that come from rated answers):
536
+
For datasets that were annotated with numerical values we could also pass the label strategy we want to use (let's assume we have another question in the dataset named "other-question" that contains values that come from rated answers):
537
537
538
538
```python
539
539
task = TrainingTask.for_sentence_similarity(
@@ -1547,4 +1547,4 @@ Options:
1547
1547
--update-config-kwargs TEXT update_config() kwargs to be passed as a dictionary. [default: {}]
Copy file name to clipboardExpand all lines: docs/_source/tutorials_and_integrations/integrations/add_sentence_transformers_embeddings_as_vectors.ipynb
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@
23
23
"\n",
24
24
"The basic idea is to use a pre-trained model to generate a vector representation for each relevant `TextFields` within the records. These vectors are then indexed within our databse and can then used to search based the similarity between texts. This should be useful for searching similar records based on the semantic meaning of the text.\n",
25
25
"\n",
26
-
"To get the these vectors and config, we will use the `SentenceTransformersExtractor` based on the [sentence-transformers](https://www.sbert.net/index.html) library. The default model we use for this is the [TaylorAI/bge-micro-v2](https://huggingface.co/TaylorAI/bge-micro-v2), which offers a nice trade-off between speed and accuracy, but you can use any model from the [sentence-transformers](https://www.sbert.net/index.html) library or from the [Hugging Face Hub](https://huggingface.co/models?library=sentence-transformers)."
26
+
"To get these vectors and config, we will use the `SentenceTransformersExtractor` based on the [sentence-transformers](https://www.sbert.net/index.html) library. The default model we use for this is the [TaylorAI/bge-micro-v2](https://huggingface.co/TaylorAI/bge-micro-v2), which offers a nice trade-off between speed and accuracy, but you can use any model from the [sentence-transformers](https://www.sbert.net/index.html) library or from the [Hugging Face Hub](https://huggingface.co/models?library=sentence-transformers)."
0 commit comments