|
1 | 1 | { |
2 | 2 | "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "id": "8ea3c629-ffd7-48f6-9ff1-96a43d120c9f", |
| 6 | + "metadata": {}, |
| 7 | + "source": [ |
| 8 | + "# Reproduction of the TrAdaBoost experiments" |
| 9 | + ] |
| 10 | + }, |
| 11 | + { |
| 12 | + "cell_type": "markdown", |
| 13 | + "id": "3aedc081-d6af-45dc-a9e1-bcd71e83f90b", |
| 14 | + "metadata": {}, |
| 15 | + "source": [ |
| 16 | + "<div class=\"btn btn-notebook\" role=\"button\">\n", |
| 17 | + " <img src=\"../_static/images/github_logo_32px.png\"> [View on GitHub](https://github.com/adapt-python/notebooks/blob/d0364973c642ea4880756cef4e9f2ee8bb5e8495/Two_moons.ipynb)\n", |
| 18 | + "</div>" |
| 19 | + ] |
| 20 | + }, |
3 | 21 | { |
4 | 22 | "cell_type": "markdown", |
5 | 23 | "id": "a22504a0-5ff7-498e-bc35-a6c101926204", |
6 | 24 | "metadata": {}, |
7 | 25 | "source": [ |
8 | | - "# Reproduction of the TrAdaBoost experiments\n", |
9 | | - "\n", |
10 | 26 | "The purpose of this example is to reproduce the results obtained in the paper [Boosting for Transfer Learning (2007)](https://cse.hkust.edu.hk/~qyang/Docs/2007/tradaboost.pdf). In this work, the authors developed a transfer algorithm called TrAdaBoost dedicated for [supervised domain adaptation](https://adapt-python.github.io/adapt/map.html). You can find more details about this algorithm [here](https://adapt-python.github.io/adapt/generated/adapt.instance_based.TrAdaBoost.html). The goal of this algorithm is to combine a source dataset with many labeled instances to a target dataset with few labels in order to learn a good model on the target domain.\n", |
11 | 27 | "\n", |
12 | 28 | "We try to reproduce the two following exepriments:\n", |
| 29 | + "\n", |
13 | 30 | "- Mushrooms\n", |
14 | 31 | "- 20newsgroups\n", |
15 | 32 | "\n" |
|
314 | 331 | "metadata": {}, |
315 | 332 | "source": [ |
316 | 333 | "<div class=\"alert alert-block alert-info\">\n", |
317 | | - "<b>Note:</b> When looking at the number of instances in each category of the *stalk-shape* attribute, it seems that the authors inversed the source data set with the target one in the text above. Indeed, when looking at Table 1 in the paper, the number of source instances should be 4608 which corresponds to the <b>tapering</b> class and not the <b>enlarging</b> one.</div>\n", |
318 | | - "\n" |
| 334 | + "**Note:** When looking at the number of instances in each category of the *stalk-shape* attribute, it seems that the authors inversed the source data set with the target one in the text above. Indeed, when looking at Table 1 in the paper, the number of source instances should be 4608 which corresponds to the **tapering** class and not the **enlarging** one.</div>" |
319 | 335 | ] |
320 | 336 | }, |
321 | 337 | { |
|
552 | 568 | "id": "babd1cce-c39e-4516-9a10-9d7e9f00f190", |
553 | 569 | "metadata": {}, |
554 | 570 | "source": [ |
555 | | - "## 20 NewsGroup experiments" |
| 571 | + "## 20 NewsGroup" |
556 | 572 | ] |
557 | 573 | }, |
558 | 574 | { |
|
641 | 657 | "We conduct the three proposed experiments \"rec vs talk\", \"rec vs sci\" and \"sci vs talk\". We set the number of TrAdaBoost estimators to 10 instead of 100. We found that using 100 estimators give poor results for TrAdaBoost." |
642 | 658 | ] |
643 | 659 | }, |
644 | | - { |
645 | | - "cell_type": "code", |
646 | | - "execution_count": 26, |
647 | | - "id": "1d47ec68-f638-42aa-9c11-c37caa61fe14", |
648 | | - "metadata": {}, |
649 | | - "outputs": [], |
650 | | - "source": [ |
651 | | - "# source_sci = ['sci.crypt', 'sci.electronics']\n", |
652 | | - "# target_sci = ['sci.med', 'sci.space']" |
653 | | - ] |
654 | | - }, |
655 | 660 | { |
656 | 661 | "cell_type": "markdown", |
657 | 662 | "id": "054fa3be-c83e-4c64-a3ff-58c51ea397fe", |
|
0 commit comments