You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/01-walkthrough.md
+32-38Lines changed: 32 additions & 38 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,22 +1,20 @@
1
1
# Tutorial 1: Library Walkthrough
2
2
3
-
In this tutorial, we do a walkthrough of the main concepts and parameters in TransformerRanker.
4
-
This should be the first tutorial you do.
3
+
In this tutorial, we do a walkthrough of the main concepts and parameters in TransformerRanker.
5
4
6
-
Generally, finding the best LM for a specific task involves the following four steps:
5
+
Generally, finding the best LM for a specific tasks involves the following steps:
7
6
8
7
1.[Loading Datasets](#step-1-loading-datasets): Each task has a dataset. Load it from the Datasets library.
9
8
2.[Preparing Language Models](#step-2-preparing-language-models): TransformerRanker requires a list of language models to rank.
10
9
You provide this list.
11
-
3.[Ranking Language Models](#step-3-ranking-language-models): Once the dataset and LM options are provided, you can now execute the ranking.
12
-
4.[Interpreting Results](#step-4-interpreting-the-results): When ranking is complete, you can select the best-suited model(s).
10
+
1.[Ranking Language Models](#step-3-ranking-language-models): Once the dataset and LM options are provided, you can now execute the ranking.
11
+
2.[Interpreting Results](#step-4-interpreting-the-results): When ranking is complete, you can select the best-suited model(s) for the dataset.
13
12
14
13
The goal of this tutorial is to understand these four steps.
15
14
16
15
## Example Task
17
16
18
-
For this tutorial, we use the example task of text classification over the classic TREC dataset. Our goal is
19
-
to find the best-suited language model. The full code for ranking LMs on TREC is:
17
+
We use the example task of text classification over the classic TREC dataset. Our goal is to find the best-suited LM for TREC. The full code:
20
18
21
19
```python
22
20
from datasets import load_dataset
@@ -40,14 +38,9 @@ print(results)
40
38
41
39
## Step 1. Loading Datasets
42
40
43
-
Use the Hugging Face Datasets library to load datasets from their [text classification](https://huggingface.co/datasets?task_categories=task_categories:text-classification&sort=trending) section. You load a dataset by passing its string identifier.
41
+
Use the Hugging Face Datasets library to load datasets from their [text classification](https://huggingface.co/datasets?task_categories=task_categories:text-classification&sort=trending) section. You load a dataset by passing its string identifier.
44
42
45
-
In this example, we use the TREC dataset, which categorizes questions based on the type of information they seek. It comes with coarse and fine-grained categoaries:
46
-
47
-
-**Coarse-grained:** descriptions (DESC), entities (ENTY), abbreviations (ABBR), humans (HUM), locations (LOC), and numeric values (NUM). For example, the question _"What is a Devo hat?"_ is categorized under descriptions (DESC).
48
-
-**Fine-grained:** Divides broad categories into 50 subclasses, with the same question having a label DESC:def (definition).
49
-
50
-
Here's how to laod TREC:
43
+
Here is how to load TREC:
51
44
52
45
```python
53
46
from datasets import load_dataset
@@ -58,7 +51,7 @@ dataset = load_dataset('trec')
58
51
print(dataset)
59
52
```
60
53
61
-
Inspect the dataset structure on the [dataset page](https://huggingface.co/datasets/trec) or by printing it:
54
+
Inspect the dataset by printing it:
62
55
63
56
```bash
64
57
DatasetDict({
@@ -74,14 +67,14 @@ DatasetDict({
74
67
```
75
68
76
69
Key things to note:
77
-
-__Dataset size__: Check the number of texts (around 6,000). Use this to set an appropriate `dataset_downsample` ratio for ranking.
78
-
-__Text and label columns__: Ensure the dataset has texts and labels. Some datasets can be messy.
70
+
-__Dataset size__: TREC has ~6,000 texts. Use this to set a `downsample` ratio.
71
+
-__Text and label fields__: Some datasets are messy. Ensure texts and labels are non-empty. Note that some datasets may have multiple label fields (e.g., coarse and fine-grained classes).
79
72
80
73
## Step 2. Preparing Language Models
81
74
82
75
Next, prepare a list of language models to rank.
83
76
You can choose any models from the [model hub](https://huggingface.co/models).
84
-
If unsure where to start, use our predefined list of popular models:
77
+
If unsure where to start, use our predefined list of models:
85
78
86
79
```python
87
80
from transformer_ranker import prepare_popular_models
Running time varies based on dataset size and selected language models. Here are two examples:
190
+
Ranking is generally fast, but runtime depends on dataset size, text length, and the size of selected models.
191
+
For example, on TREC:
199
192
200
-
-The **downsampled TREC** dataset (1,190 instances) takes about 2.3 minutes to process 17 base-sized models: 1.2 minutes for downloading and 1.1 minutes for embedding and scoring.
201
-
-The full TREC dataset (5,952 instances) takes about 4.8 minutes: 1.2 minutes for downloads and 3.6 minutes for embedding and scoring.
193
+
-~2.3 min to rank 17 base models on 20% of the dataset (1190 texts)
194
+
-~4.8 min to rank same models on the full dataset (5952 texts)
202
195
203
-
We used Colab Notebook with a Tesla T4 GPU. Note that TREC has short texts (10 words on average) and embedding longer texts will take more time.
196
+
Tested on a Colab Notebook (Tesla T4 GPU).
204
197
205
198
## Step 4. Interpreting the Results
206
199
207
-
Doing `print(results)` displays the ranked language models from Step 2, along with their **transferability scores**.
208
-
A **higher score**means the model is better suited for your dataset.
209
-
Here’s the output after ranking 17 language models on TREC:
200
+
Once the ranking is complete, the final list of LM names and their **transferability scores** wil be shown.
201
+
Higher transferability means better suitability for the dataset.
The top-ranked model _'deberta-v3-base'_ is a strong candidate for fine-tuning. We recommend fine-tuning other highly ranked models for comparison.
224
+
Where the top-ranked model is _'deberta-v3-base'_.
225
+
This should be the LM to use for the selected downstream dataset.
226
+
However, we recommend fine-tuning other highly ranked models for comparison.
232
227
233
228
To fine-tune the top-ranked model, use any framework of your choice (e.g.
234
229
<ahref="https://flairnlp.github.io/">Flair</a> or Transformers — we opt for the first one ;p).
235
230
236
231
## Summary
237
232
238
-
This tutorial shows the four steps for selecting the best-suited LM for an NLP task.
239
-
We (1) loaded a text classification dataset, (2) prepared a list of language model names, and (3) ranked them based on transferability scores.
233
+
This tutorial showed how to use TransformerRanker in four steps. We loaded a text classification dataset, prepared a list of LM names, and ranked them based on transferability scores.
240
234
241
235
In the next tutorial, we give examples for a variety of NLP tasks.
Copy file name to clipboardExpand all lines: docs/03-advanced.md
+23-21Lines changed: 23 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,10 @@
1
1
# Tutorial 3: Advanced
2
2
3
-
Previous tutorials showed how to rank LMs using default parameters and datasets from the hub.
4
-
This tutorial covers how to load custom datasets and use two optional parameters in the ranker: `estimator` and `layer_aggregator`.
3
+
In the advanced tutorial, we go over how to change transferability metrics using the estimator parameter, load custom datasets, and run TransformerRanker with non-default settings. We also show a special case: finding the best-performing layer in a single language model.
5
4
6
5
## Loading Custom Datasets
7
6
8
-
TransformerRanker uses `load_dataset()` from the 🤗 Datasets library.
9
-
To load local text files instead of datasets from the hub, do:
7
+
Not all datasets are available in the Hugging Face Datasets library. If you have a custom dataset stored in local text files, you can load it using the following snippet:
Train/dev/test splits are optional—TransformerRanker merges and downsamples datasets automatically.
36
-
Once loaded, initialize the ranker with your dataset as shown in previous tutorials.
37
-
For `.csv` or `.json` formats, see the complete load_dataset() [guide](https://huggingface.co/docs/datasets/v1.7.0/loading_datasets.html#from-local-files).
33
+
Specifying train/dev/test is optional—TransformerRanker merges and downsamples datasets automatically.
34
+
Once loaded, do the LM ranking as in previous tutorials.
35
+
36
+
To load .json or .csv files take a look at the [guide](https://huggingface.co/docs/datasets/v1.7.0/loading_datasets.html#from-local-files) of Datasets.
38
37
39
38
## Transferability Metrics
40
39
41
-
Change the transferability metric by setting the `estimator` parameter in the `.run()` method. To change to LogME, do:
40
+
The transferability metric can be changed by setting the `estimator` parameter in the .run() method. To change the metric to LogME, do:
__Transferability Explanation:__ transferability metrics estimate how suitable a model is for a new task — without requiring fine-tuning.
48
-
For a pre-trained LM this means assessing how well its embeddings align with a new dataset.
46
+
__Transferability Explanation:__ Transferability metrics estimate how well a model is likely to perform on a new dataset without requiring fine-tuning. For a pre-trained language model, this means evaluating how well its embeddings capture the structure of the target dataset.
47
+
48
+
The following metrics are supported:
49
49
50
-
Here are the supported metrics:
50
+
-**`hscore`***(default)*: Fast and generally the best choice for most datasets. Suited for classification tasks
-`hscore` (default): Fast and generally the best choice for most datasets. Suited for classification tasks [H-Score code](https://github.com/flairNLP/transformer-ranker/blob/main/transformer_ranker/estimators/hscore.py).
53
-
-`logme`: Suitable for both classification and regression tasks [LogME code](https://github.com/flairNLP/transformer-ranker/blob/main/transformer_ranker/estimators/logme.py).
54
-
-`nearestneighbors`: Slowest and least accurate, but easy to interpret [k-NN code](https://github.com/flairNLP/transformer-ranker/blob/main/transformer_ranker/estimators/nearesneighbors.py).
53
+
-**`logme`**: Suitable for both classification and regression tasks
For a better understanding of each metric, see our code and comments, or refer to the original papers.
57
60
58
61
## Layer Aggregation
59
62
60
-
By default, TransformerRanker averages all hidden layers. But some datasets may work better with other strategies.
61
-
Use `layer_aggregator` to control which layer(s) are used for embeddings. To use the best performing layer, do:
63
+
To improve existing transferability estimation approaches, we propose to average all hidden layers in LMs. This can be controled by changing the `layer_aggregator` parameter.
@@ -123,7 +127,7 @@ Compare this ranking with the one in the main [README](https://github.com/flairN
123
127
## Example: Inspecting Layer Transferability in a Single LM
124
128
125
129
You can also inspect layer-wise transferability scores for a single large model.
126
-
Here’s how to rank the layers of DeBERTa-v2-xxlarge (1.5B) on CoNLL2003:
130
+
Here’s how to rank layers of DeBERTa-v2-xxlarge (1.5B) on CoNLL2003:
127
131
128
132
129
133
```python
@@ -154,6 +158,4 @@ Useful for inspecting layer-wise transferability for a downstream dataset.
154
158
155
159
## Summary
156
160
157
-
Here, we demonstrated how to load a custom dataset not hosted on the Hugging Face Hub.
158
-
We then introduced two optional parameters for TransformerRanker: `estimator` and `layer_aggregator`,
159
-
which can be adjusted based on the task or to compare transferability metrics.
161
+
In this tutorial, we explored advanced features of TransformerRanker: how to load custom datasets, switch transferability metrics with the `estimator` parameter, and identify the best-suited layer using the `layer_aggregator` parameter. These settings can be adjusted based on the task or to compare different transferability metrics.
0 commit comments