Skip to content

Linkers evaluation

Marco Fossati edited this page May 7, 2019 · 12 revisions

Setting

  • run: April 11 2019 on soweego-1 VPS instance;
  • output folder: /srv/dev/20190411;
  • head commit: 1505429997b878568a9e24185dc3afa7ad4720eb;
  • command: python -m soweego linker evaluate ${Algorithm} ${Dataset} ${Entity};
  • evaluation technique: stratified 5-fold cross validation over training/test splits;
  • mean performance scores over the folds.

Algorithms parameters

  • Naïve Bayes (NB):
    • binarize = 0.1;
    • alpha = 0.0001;
  • liblinear SVM (LSVM): default parameters as per scikit LinearSVC;
  • libsvm SVM (SVM):
    • kernel = linear;
    • other parameters as per scikit SVC defaults;
  • single-layer perceptron (SLP):
    • layer = fully connected (Dense);
    • activation = sigmoid;
    • optimizer = stochastic gradient descent;
    • loss = binary cross-entropy;
    • training batch size = 1,024;
    • training epochs = 100.

Performance

Algorithm Dataset Entity Precision (std) Recall (std) F-score (std)
NB Discogs Band .789 (.0031) .941 (.0004) .859 (.002)
LSVM Discogs Band .785 (.0058) .946 (.0029) .858 (.0034)
SVM Discogs Band .777 (.003) .963 (.0016) .86 (.0024)
SLP Discogs Band .776 (.0041) .956 (.0012) .857 (.0029)
NB Discogs Musician .836 (.0018) .958 (.0012) .893 (.0013)
SVM Discogs Musician .814 (.0015) .986 (.0003) .892 (.001)
SLP Discogs Musician .815 (.002) .985 (.0006) .892 (.0012)
NB IMDb Actor TODO TODO TODO
SVM IMDb Actor TODO TODO TODO
SLP IMDb Actor TODO TODO TODO
NB IMDb Director .897 (.00195) .971 (.0012) .932 (.001)
SVM IMDb Director .919 (.0031) .942 (.0019) .93 (.002)
SLP IMDb Director .867 (.0115) .953 (.0043) .908 (.0056)
NB IMDb Musician .891 (.0042) .96 (.0022) .924 (.0026)
SVM IMDb Musician .917 (.0043) .937 (.0034) .927 (.003)
SLP IMDb Musician .922 (.005) .914 (.0092) .918 (.0055)
NB IMDb Producer .871 (.0023) .97 (.0037) .918 (.0011)
SVM IMDb Producer .92 (.005) .938 (.0038) .929 (.0026)
SLP IMDb Producer .862 (.0609) .914 (.0648) .883 (.0185)
NB IMDb Writer .91 (.003) .961 (.0022) .935 (.0022)
SVM IMDb Writer .936 (.0029) .948 (.0025) .942 (.0026)
SLP IMDb Writer .903 (.0154) .955 (.0147) .928 (.0047)
NB MusicBrainz Band .822 (.00169) .985 (.0008) .896 (.001)
SVM MusicBrainz Band .943 (.0019) .888 (.0027) .914 (.0016)
SLP MusicBrainz Band .93 (.0265) .885 (.0103) .907 (.0082)
NB MusicBrainz Musician .955 (.0009) .936 (.0011) .946 (.00068)
SVM MusicBrainz Musician .941 (.0011) .962 (.001) .952 (.0004)
SLP MusicBrainz Musician .943 (.0018) .956 (.0019) .949 (.0007)

Confidence

The following plots display the confidence scores distribution and the total predictions yielded by each algorithm on each target classification set.

Note that linear SVM is omitted since it does not output probability scores.

Axes:

  • x = # predictions;
  • y = confidence score.

Discogs band

NB, SVM, SLP.

Discogs musician

NB, SVM, SLP.

IMDb director

NB, SVM, SLP.

IMDb musician

NB, SVM, SLP.

IMDb producer

NB, SVM, SLP.

IMDb writer

NB, SVM, SLP.

MusicBrainz band

NB, SVM, SLP.

MusicBrainz musician

NB, SVM, SLP.

Comparison

See the plots above to have a rough idea on the amount of confident predictions.

Threshold values:

  • # predictions >= 0.0000000001, i.e., equivalent to almost all matches;
  • # confident >= 0.8.

Discogs band

WD items: 50,316

Measure NB LSVM SVM SLP
Precision .789 .785 .777 .776
Recall .941 .946 .963 .957
F-score .859 .858 .86 .857
# predictions 820 51 94,430 91,295
# confident 219 N.A. 1,660 5,355

Discogs musician

WD items: 199,180

Measure NB LSVM SVM SLP
Precision .836 .814 .815 .815
Recall .958 .986 .985 .985
F-score .893 .892 .892 .892
# predictions 3,872 200 533,301 517,450
# confident 1,101 N.A. 98,172 58,437

IMDb director

WD items: 9,249

Measure NB LSVM SVM SLP
Precision .897 .919 .908 .867
Recall .971 .942 .958 .953
F-score .932 .93 .932 .908
# predictions 192 10 17,557 17,187
# confident 60 N.A. 1,616 553

IMDb musician

WD items: 217,139

Measure NB LSVM SVM SLP
Precision .891 .917 .908 .922
Recall .96 .937 .942 .914
F-score .924 .927 .924 .918
# predictions 4,806 218 406,674 398,346
# confident 1,341 N.A. 21,462 7,244

IMDb producer

WD items: 2,251

Measure NB LSVM SVM SLP
Precision .871 .92 .923 .862
Recall .97 .938 .926 .914
F-score .918 .929 .925 .883
# predictions 56 3 5,249 5,116
# confident 15 N.A. 507 180

IMDb writer

WD items: 16,446

Measure NB LSVM SVM SLP
Precision .91 .936 .932 .903
Recall .961 .948 .954 .955
F-score .935 .942 .943 .928
# predictions 428 17 45,122 44,338
# confident 138 N.A. 2,934 1,548

MusicBrainz band

WD items: 32,658

Measure NB LSVM SVM SLP
Precision .822 .943 .939 .93
Recall .985 .888 .893 .885
F-score .896 .914 .915 .907
# predictions 265 33 39,618 38,012
# confident 46 N.A. 1,475 501

MusicBrainz musician

WD items: 153,725

Measure NB LSVM SVM SLP
Precision .955 .941 .95 .943
Recall .936 .962 .938 .956
F-score .946 .952 .944 .949
# predictions 2,833 154 280,029 260,530
# confident 1,212 N.A. 7,496 7,339

Experiments

Single-layer perceptron optimizers

https://github.com/Wikidata/soweego/issues/285

Setting

  • run: May 3 2019 on soweego-2 VPS instance;
  • output folder: /srv/dev/20190503/;
  • head commit: d0d390e622f2782a49a1bd0ebfc64478ed34aa0c;
  • command: python -m soweego linker evaluate slp ${Dataset} ${Entity} optimizer=${Optimizer};
  • evaluation technique: stratified 5-fold cross validation over training/test splits;
  • mean performance scores over the folds.

Discogs band

Optimizer Precision Recall F-score
sgd .782 .945 .856
rmsprop .801 .930 .860
nadam .805 .925 .861
adamax .795 .938 .861
adam .800 .929 .860
adagrad .802 .927 .859
adadelta .799 .934 .861

Discogs musician

Optimizer Precision Recall F-score
sgd .815 .985 .892
rmsprop .816 .985 .893
nadam .816 .986 .893
adamax .817 .985 .893
adam .816 .985 .893
adagrad .816 .986 .893
adadelta .815 .986 .892

Imdb director

Optimizer Precision Recall F-score
sgd .918 .954 .936
rmsprop .895 .954 .923
nadam .908 .954 .930
adamax .907 .955 .930
adam .909 .953 .931
adagrad .867 .950 .907
adadelta .902 .954 .927

Imdb musician

Optimizer Precision Recall F-score
sgd .912 .927 .920
rmsprop .913 .929 .921
nadam .913 .929 .921
adamax .913 .928 .921
adam .913 .928 .921
adagrad .873 .860 .866
adadelta .913 .928 .921

Imdb producer

Optimizer Precision Recall F-score
sgd .917 .942 .929
rmsprop .916 .938 .927
nadam .916 .938 .927
adamax .916 .940 .928
adam .916 .938 .927
adagrad .852 .684 .756
adadelta .916 .939 .928

Imdb writer

Optimizer Precision Recall F-score
sgd .929 .943 .936
rmsprop .927 .940 .934
nadam .930 .940 .935
adamax .930 .941 .935
adam .930 .940 .935
adagrad .872 .923 .896
adadelta .931 .941 .936

Musicbrainz band

Optimizer Precision Recall F-score
sgd .952 .869 .909
rmsprop .949 .875 .911
nadam .949 .877 .911
adamax .952 .871 .910
adam .951 .875 .911
adagrad .932 .886 .909
adadelta .952 .874 .911

Musicbrainz musician

Optimizer Precision Recall F-score
sgd .942 .957 .949
rmsprop .941 .958 .949
nadam .941 .958 .949
adamax .941 .958 .949
adam .941 .958 .949
adagrad .946 .953 .950
adadelta .941 .958 .950

Clone this wiki locally