Skip to content

Commit 6bf1de1

Browse files
committed
Rework tutorials
1 parent 5f9fb20 commit 6bf1de1

File tree

3 files changed

+63
-59
lines changed

3 files changed

+63
-59
lines changed

docs/01-walkthrough.md

Lines changed: 32 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,20 @@
11
# Tutorial 1: Library Walkthrough
22

3-
In this tutorial, we do a walkthrough of the main concepts and parameters in TransformerRanker.
4-
This should be the first tutorial you do.
3+
In this tutorial, we do a walkthrough of the main concepts and parameters in TransformerRanker.
54

6-
Generally, finding the best LM for a specific task involves the following four steps:
5+
Generally, finding the best LM for a specific tasks involves the following steps:
76

87
1. [Loading Datasets](#step-1-loading-datasets): Each task has a dataset. Load it from the Datasets library.
98
2. [Preparing Language Models](#step-2-preparing-language-models): TransformerRanker requires a list of language models to rank.
109
You provide this list.
11-
3. [Ranking Language Models](#step-3-ranking-language-models): Once the dataset and LM options are provided, you can now execute the ranking.
12-
4. [Interpreting Results](#step-4-interpreting-the-results): When ranking is complete, you can select the best-suited model(s).
10+
1. [Ranking Language Models](#step-3-ranking-language-models): Once the dataset and LM options are provided, you can now execute the ranking.
11+
2. [Interpreting Results](#step-4-interpreting-the-results): When ranking is complete, you can select the best-suited model(s) for the dataset.
1312

1413
The goal of this tutorial is to understand these four steps.
1514

1615
## Example Task
1716

18-
For this tutorial, we use the example task of text classification over the classic TREC dataset. Our goal is
19-
to find the best-suited language model. The full code for ranking LMs on TREC is:
17+
We use the example task of text classification over the classic TREC dataset. Our goal is to find the best-suited LM for TREC. The full code:
2018

2119
```python
2220
from datasets import load_dataset
@@ -40,14 +38,9 @@ print(results)
4038

4139
## Step 1. Loading Datasets
4240

43-
Use the Hugging Face Datasets library to load datasets from their [text classification](https://huggingface.co/datasets?task_categories=task_categories:text-classification&sort=trending) section. You load a dataset by passing its string identifier.
41+
Use the Hugging Face Datasets library to load datasets from their [text classification](https://huggingface.co/datasets?task_categories=task_categories:text-classification&sort=trending) section. You load a dataset by passing its string identifier.
4442

45-
In this example, we use the TREC dataset, which categorizes questions based on the type of information they seek. It comes with coarse and fine-grained categoaries:
46-
47-
- **Coarse-grained:** descriptions (DESC), entities (ENTY), abbreviations (ABBR), humans (HUM), locations (LOC), and numeric values (NUM). For example, the question _"What is a Devo hat?"_ is categorized under descriptions (DESC).
48-
- **Fine-grained:** Divides broad categories into 50 subclasses, with the same question having a label DESC:def (definition).
49-
50-
Here's how to laod TREC:
43+
Here is how to load TREC:
5144

5245
```python
5346
from datasets import load_dataset
@@ -58,7 +51,7 @@ dataset = load_dataset('trec')
5851
print(dataset)
5952
```
6053

61-
Inspect the dataset structure on the [dataset page](https://huggingface.co/datasets/trec) or by printing it:
54+
Inspect the dataset by printing it:
6255

6356
```bash
6457
DatasetDict({
@@ -74,14 +67,14 @@ DatasetDict({
7467
```
7568

7669
Key things to note:
77-
- __Dataset size__: Check the number of texts (around 6,000). Use this to set an appropriate `dataset_downsample` ratio for ranking.
78-
- __Text and label columns__: Ensure the dataset has texts and labels. Some datasets can be messy.
70+
- __Dataset size__: TREC has ~6,000 texts. Use this to set a `downsample` ratio.
71+
- __Text and label fields__: Some datasets are messy. Ensure texts and labels are non-empty. Note that some datasets may have multiple label fields (e.g., coarse and fine-grained classes).
7972

8073
## Step 2. Preparing Language Models
8174

8275
Next, prepare a list of language models to rank.
8376
You can choose any models from the [model hub](https://huggingface.co/models).
84-
If unsure where to start, use our predefined list of popular models:
77+
If unsure where to start, use our predefined list of models:
8578

8679
```python
8780
from transformer_ranker import prepare_popular_models
@@ -93,21 +86,20 @@ language_models = prepare_popular_models('base')
9386
print(language_models[:5])
9487
```
9588

96-
The `language_models` list contains identifiers for each model:
89+
The `language_models` list contains string identifiers for each model:
9790

9891
```console
9992
['distilbert-base-cased', 'typeform/distilroberta-base-v2', 'bert-base-cased', 'SpanBERT/spanbert-base-cased', 'roberta-base']
10093
```
10194

102-
Feel free to create your own list of models.
103-
We suggest exploring models that vary in pretraining tasks (masked language modeling, replaced token detection or sentence-transformers)
104-
and those trained with different data (multilingual, domain-specific models).
95+
Feel free to create your own list of model names.
96+
We recommend including models that were pre-trained on different tasks and datasets.
10597

10698
## Step 3. Ranking Language Models
10799

108100
You have now selected a task with its dataset (TREC) and a list of LMs to rank.
109101

110-
In most cases, you can use our ranker with default parameters. Often, it is more efficient to downsample the data a bit to speed up ranking:
102+
In most cases, you can use our ranker with default parameters. Often, it is more efficient to downsample the dataset to speed up ranking:
111103

112104
```python
113105
from transformer_ranker import TransformerRanker
@@ -120,7 +112,7 @@ results = ranker.run(language_models, batch_size=64)
120112
print(results)
121113
```
122114

123-
In this example, we downsampled the data to 20% and are running the ranker with a batch size of 64. You can modify these
115+
Here we downsampled the data to 20% and are running the ranker with a batch size of 64. You can modify these
124116
two parameters:
125117
- `dataset_downsample`: Set it to 1. to estimate over the full dataset. Or lower than 0.2 to make an estimation even faster.
126118
We found that downsampling to 20% often does not hurt estimation performance.
@@ -180,9 +172,9 @@ print(results)
180172

181173
### Running the Ranker
182174

183-
The ranker prints logs to help you understand what happens as it runs.
184-
It iterates over each model and (1) embeds texts, (2) scores embeddings using an estimator.
185-
Logs show which model is currently being assessed.
175+
When running the ranker, each LM is processed individually:
176+
TransformerRanker embeds the texts with the LM and scores them using a transferability metric.
177+
The log shows which LM is currently being assessed:
186178

187179
```bash
188180
transformer_ranker:Text and label columns: 'text', 'coarse_label'
@@ -195,20 +187,21 @@ Computing Embeddings: 100%|██████████| 19/19 [00:00<00:00,
195187
Transferability Score: 70%|███████ | 1/1 [00:00<00:00, 9.15it/s]
196188
```
197189

198-
Running time varies based on dataset size and selected language models. Here are two examples:
190+
Ranking is generally fast, but runtime depends on dataset size, text length, and the size of selected models.
191+
For example, on TREC:
199192

200-
- The **downsampled TREC** dataset (1,190 instances) takes about 2.3 minutes to process 17 base-sized models: 1.2 minutes for downloading and 1.1 minutes for embedding and scoring.
201-
- The full TREC dataset (5,952 instances) takes about 4.8 minutes: 1.2 minutes for downloads and 3.6 minutes for embedding and scoring.
193+
- ~2.3 min to rank 17 base models on 20% of the dataset (1190 texts)
194+
- ~4.8 min to rank same models on the full dataset (5952 texts)
202195

203-
We used Colab Notebook with a Tesla T4 GPU. Note that TREC has short texts (10 words on average) and embedding longer texts will take more time.
196+
Tested on a Colab Notebook (Tesla T4 GPU).
204197

205198
## Step 4. Interpreting the Results
206199

207-
Doing `print(results)` displays the ranked language models from Step 2, along with their **transferability scores**.
208-
A **higher score** means the model is better suited for your dataset.
209-
Here’s the output after ranking 17 language models on TREC:
200+
Once the ranking is complete, the final list of LM names and their **transferability scores** wil be shown.
201+
Higher transferability means better suitability for the dataset.
202+
The final output of the TREC example is:
210203

211-
```bash
204+
```console
212205
Rank 1. microsoft/deberta-v3-base: 4.0172
213206
Rank 2. google/electra-base-discriminator: 4.0068
214207
Rank 3. microsoft/mdeberta-v3-base: 4.0028
@@ -228,14 +221,15 @@ Rank 16. sentence-transformers/all-MiniLM-L12-v2: 3.4271
228221
Rank 17. google/electra-small-discriminator: 2.9615
229222
```
230223

231-
The top-ranked model _'deberta-v3-base'_ is a strong candidate for fine-tuning. We recommend fine-tuning other highly ranked models for comparison.
224+
Where the top-ranked model is _'deberta-v3-base'_.
225+
This should be the LM to use for the selected downstream dataset.
226+
However, we recommend fine-tuning other highly ranked models for comparison.
232227

233228
To fine-tune the top-ranked model, use any framework of your choice (e.g.
234229
<a href="https://flairnlp.github.io/">Flair</a> or Transformers — we opt for the first one ;p).
235230

236231
## Summary
237232

238-
This tutorial shows the four steps for selecting the best-suited LM for an NLP task.
239-
We (1) loaded a text classification dataset, (2) prepared a list of language model names, and (3) ranked them based on transferability scores.
233+
This tutorial showed how to use TransformerRanker in four steps. We loaded a text classification dataset, prepared a list of LM names, and ranked them based on transferability scores.
240234

241235
In the next tutorial, we give examples for a variety of NLP tasks.

docs/02-examples.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -299,3 +299,11 @@ DatasetDict({
299299

300300
</details>
301301

302+
## Summary
303+
304+
This tutorial showed how to use TransformerRanker for NER, PoS, and Text Pair tasks.
305+
To use it for different tasks, you typically only need to set the `label_column` or `text_pair_column` when initializing it with the dataset.
306+
The run method remains unchanged.
307+
308+
In the next tutorial, we show advanced functionality like changing the transferability metric for model ranking.
309+

docs/03-advanced.md

Lines changed: 23 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,10 @@
11
# Tutorial 3: Advanced
22

3-
Previous tutorials showed how to rank LMs using default parameters and datasets from the hub.
4-
This tutorial covers how to load custom datasets and use two optional parameters in the ranker: `estimator` and `layer_aggregator`.
3+
In the advanced tutorial, we go over how to change transferability metrics using the estimator parameter, load custom datasets, and run TransformerRanker with non-default settings. We also show a special case: finding the best-performing layer in a single language model.
54

65
## Loading Custom Datasets
76

8-
TransformerRanker uses `load_dataset()` from the 🤗 Datasets library.
9-
To load local text files instead of datasets from the hub, do:
7+
Not all datasets are available in the Hugging Face Datasets library. If you have a custom dataset stored in local text files, you can load it using the following snippet:
108

119
```python
1210
from datasets import load_dataset
@@ -32,33 +30,39 @@ ranker = TransformerRanker(dataset=dataset, dataset_downsample=0.2)
3230
results = ranker.run(models=language_models, batch_size=32)
3331
```
3432

35-
Train/dev/test splits are optional—TransformerRanker merges and downsamples datasets automatically.
36-
Once loaded, initialize the ranker with your dataset as shown in previous tutorials.
37-
For `.csv` or `.json` formats, see the complete load_dataset() [guide](https://huggingface.co/docs/datasets/v1.7.0/loading_datasets.html#from-local-files).
33+
Specifying train/dev/test is optional—TransformerRanker merges and downsamples datasets automatically.
34+
Once loaded, do the LM ranking as in previous tutorials.
35+
36+
To load .json or .csv files take a look at the [guide](https://huggingface.co/docs/datasets/v1.7.0/loading_datasets.html#from-local-files) of Datasets.
3837

3938
## Transferability Metrics
4039

41-
Change the transferability metric by setting the `estimator` parameter in the `.run()` method. To change to LogME, do:
40+
The transferability metric can be changed by setting the `estimator` parameter in the .run() method. To change the metric to LogME, do:
4241

4342
```python
4443
results = ranker.run(language_models, estimator="logme")
4544
```
4645

47-
__Transferability Explanation:__ transferability metrics estimate how suitable a model is for a new task — without requiring fine-tuning.
48-
For a pre-trained LM this means assessing how well its embeddings align with a new dataset.
46+
__Transferability Explanation:__ Transferability metrics estimate how well a model is likely to perform on a new dataset without requiring fine-tuning. For a pre-trained language model, this means evaluating how well its embeddings capture the structure of the target dataset.
47+
48+
The following metrics are supported:
4949

50-
Here are the supported metrics:
50+
- **`hscore`** *(default)*: Fast and generally the best choice for most datasets. Suited for classification tasks
51+
[View source](https://github.com/flairNLP/transformer-ranker/blob/main/transformer_ranker/estimators/hscore.py).
5152

52-
- `hscore` (default): Fast and generally the best choice for most datasets. Suited for classification tasks [H-Score code](https://github.com/flairNLP/transformer-ranker/blob/main/transformer_ranker/estimators/hscore.py).
53-
- `logme`: Suitable for both classification and regression tasks [LogME code](https://github.com/flairNLP/transformer-ranker/blob/main/transformer_ranker/estimators/logme.py).
54-
- `nearestneighbors`: Slowest and least accurate, but easy to interpret [k-NN code](https://github.com/flairNLP/transformer-ranker/blob/main/transformer_ranker/estimators/nearesneighbors.py).
53+
- **`logme`**: Suitable for both classification and regression tasks
54+
[View source](https://github.com/flairNLP/transformer-ranker/blob/main/transformer_ranker/estimators/logme.py).
5555

56-
For a better understanding of each metric, take a look at original papers or our code and comments.
56+
- **`nearestneighbors`**: Slowest and least accurate, but easy to interpret
57+
[View source](https://github.com/flairNLP/transformer-ranker/blob/main/transformer_ranker/estimators/nearesneighbors.py).
58+
59+
For a better understanding of each metric, see our code and comments, or refer to the original papers.
5760

5861
## Layer Aggregation
5962

60-
By default, TransformerRanker averages all hidden layers. But some datasets may work better with other strategies.
61-
Use `layer_aggregator` to control which layer(s) are used for embeddings. To use the best performing layer, do:
63+
To improve existing transferability estimation approaches, we propose to average all hidden layers in LMs. This can be controled by changing the `layer_aggregator` parameter.
64+
65+
To use the best performing layer, do:
6266

6367
```python
6468
results = ranker.run(language_models, layer_aggregator="bestlayer")
@@ -123,7 +127,7 @@ Compare this ranking with the one in the main [README](https://github.com/flairN
123127
## Example: Inspecting Layer Transferability in a Single LM
124128

125129
You can also inspect layer-wise transferability scores for a single large model.
126-
Here’s how to rank the layers of DeBERTa-v2-xxlarge (1.5B) on CoNLL2003:
130+
Here’s how to rank layers of DeBERTa-v2-xxlarge (1.5B) on CoNLL2003:
127131

128132

129133
```python
@@ -154,6 +158,4 @@ Useful for inspecting layer-wise transferability for a downstream dataset.
154158

155159
## Summary
156160

157-
Here, we demonstrated how to load a custom dataset not hosted on the Hugging Face Hub.
158-
We then introduced two optional parameters for TransformerRanker: `estimator` and `layer_aggregator`,
159-
which can be adjusted based on the task or to compare transferability metrics.
161+
In this tutorial, we explored advanced features of TransformerRanker: how to load custom datasets, switch transferability metrics with the `estimator` parameter, and identify the best-suited layer using the `layer_aggregator` parameter. These settings can be adjusted based on the task or to compare different transferability metrics.

0 commit comments

Comments
 (0)