Difference in evaluation metric while evaluating NER + RELATIONAL model #9808

karndeepsingh · 2021-12-06T07:37:47Z

karndeepsingh
Dec 6, 2021

I have been training NER+ RELATIONAL Transformer model and while training it is evaluated at every 500 iterations. So, F1 score, Recall, and precision are shown in 93-95% while training. And when I explicitly run command ! spacy project run evaluate on the same dev dataset that I have used for evaluation while training, it shows very little F1 Score, Precision, and Recall. Please advise here what is the problem and how can get exact metrics to evaluate? Which metric shall I believe, the information that is printed while training or information printed while running ! spacy project run evaluate? Below is the output when I run! a spacy project run evaluate

Running command: /home/gsatis/anaconda3/envs/spacy/bin/python ./scripts/evaluate.py training/model-best data/relations_final_test.spacy False

Random baseline:
threshold 0.00 	 {'rel_micro_p': '5.07', 'rel_micro_r': '100.00', 'rel_micro_f': '9.66'}
threshold 0.05 	 {'rel_micro_p': '5.07', 'rel_micro_r': '94.77', 'rel_micro_f': '9.62'}
threshold 0.10 	 {'rel_micro_p': '5.07', 'rel_micro_r': '89.73', 'rel_micro_f': '9.60'}
threshold 0.20 	 {'rel_micro_p': '5.12', 'rel_micro_r': '80.65', 'rel_micro_f': '9.63'}
threshold 0.30 	 {'rel_micro_p': '5.10', 'rel_micro_r': '70.45', 'rel_micro_f': '9.52'}
threshold 0.40 	 {'rel_micro_p': '5.22', 'rel_micro_r': '61.72', 'rel_micro_f': '9.63'}
threshold 0.50 	 {'rel_micro_p': '5.21', 'rel_micro_r': '51.48', 'rel_micro_f': '9.46'}
threshold 0.60 	 {'rel_micro_p': '5.25', 'rel_micro_r': '41.52', 'rel_micro_f': '9.32'}
threshold 0.70 	 {'rel_micro_p': '5.14', 'rel_micro_r': '30.59', 'rel_micro_f': '8.80'}
threshold 0.80 	 {'rel_micro_p': '5.42', 'rel_micro_r': '21.35', 'rel_micro_f': '8.65'}
threshold 0.90 	 {'rel_micro_p': '5.49', 'rel_micro_r': '10.70', 'rel_micro_f': '7.25'}
threshold 0.99 	 {'rel_micro_p': '6.29', 'rel_micro_r': '1.19', 'rel_micro_f': '2.01'}
threshold 1.00 	 {'rel_micro_p': '5.45', 'rel_micro_r': '0.12', 'rel_micro_f': '0.23'}

Results of the trained model:
threshold 0.00 	 {'rel_micro_p': '5.08', 'rel_micro_r': '100.00', 'rel_micro_f': '9.66'}
threshold 0.05 	 {'rel_micro_p': '64.12', 'rel_micro_r': '94.30', 'rel_micro_f': '76.34'}
threshold 0.10 	 {'rel_micro_p': '72.77', 'rel_micro_r': '93.25', 'rel_micro_f': '81.74'}
threshold 0.20 	 {'rel_micro_p': '78.31', 'rel_micro_r': '91.71', 'rel_micro_f': '84.48'}
threshold 0.30 	 {'rel_micro_p': '80.33', 'rel_micro_r': '90.54', 'rel_micro_f': '85.13'}
threshold 0.40 	 {'rel_micro_p': '81.47', 'rel_micro_r': '89.76', 'rel_micro_f': '85.41'}
threshold 0.50 	 {'rel_micro_p': '82.13', 'rel_micro_r': '88.14', 'rel_micro_f': '85.03'}
threshold 0.60 	 {'rel_micro_p': '82.73', 'rel_micro_r': '86.08', 'rel_micro_f': '84.37'}
threshold 0.70 	 {'rel_micro_p': '83.21', 'rel_micro_r': '83.49', 'rel_micro_f': '83.35'}
threshold 0.80 	 {'rel_micro_p': '83.88', 'rel_micro_r': '79.29', 'rel_micro_f': '81.52'}
threshold 0.90 	 {'rel_micro_p': '85.05', 'rel_micro_r': '71.74', 'rel_micro_f': '77.83'}
threshold 0.99 	 {'rel_micro_p': '86.49', 'rel_micro_r': '29.31', 'rel_micro_f': '43.78'}
threshold 1.00 	 {'rel_micro_p': '80.15', 'rel_micro_r': '3.94', 'rel_micro_f': '7.51'}

Secondly, How shall I remove False Positive? Because I can see a lot of extra words are being predicted wrong while inferencing.
Below image show the predicted output. Right prediction should be "Borrower" but it is predicting extra word with "is" as "Party_Role". Similarly you can see "is" is predicted as "PARTY_ADDRESS" which wrong and it is False Positive. So, How can we take care of such wrong predicitons? Any suggestions would be helpful.

Answered by svlandeg

Dec 9, 2021

And when I explicitly run command ! spacy project run evaluate on the same dev dataset that I have used for evaluation while training, it shows very little F1 Score, Precision, and Recall.

By "very little", are you referring to the +/- 85% F-scores that are being printed for the best threshold cutoff? Because it looks to me like your model is in fact properly trained (just compare it to the baseline). If there is a discrepancy with the numbers reported during training, could you paste the output log of that as well?

View full answer

svlandeg · 2021-12-09T12:08:54Z

svlandeg
Dec 9, 2021

And when I explicitly run command ! spacy project run evaluate on the same dev dataset that I have used for evaluation while training, it shows very little F1 Score, Precision, and Recall.

By "very little", are you referring to the +/- 85% F-scores that are being printed for the best threshold cutoff? Because it looks to me like your model is in fact properly trained (just compare it to the baseline). If there is a discrepancy with the numbers reported during training, could you paste the output log of that as well?

9 replies

karndeepsingh Dec 9, 2021
Author

I mentioned it wrong. It is 85% for this log. Sorry for this wrong information.

svlandeg Dec 9, 2021

It'll be helpful if you show the full output of your console: the train command, all the logs (not just the table), and then you running the evaluate script right after it so we can verify the numbers directly from the console. This will ensure we're looking at the right things and discussing about the same numbers.

karndeepsingh Dec 9, 2021
Author

Project.yaml File:

title: "Example project of creating a novel nlp component to do relation extraction from scratch."
description: "This example project shows how to implement a spaCy component with a custom Machine Learning model, how to train it with and without a transformer, and how to apply it on an evaluation dataset."

# Variables can be referenced across the project.yml using ${vars.var_name}
vars:
  annotations: 
  tok2vec_config:
  trf_config: "configs/rel_trf.cfg"
  train_file: "data/relations_final_train.spacy"
  dev_file: "data/relations_final_test.spacy"
  test_file: "data/relations_final_test.spacy"
  trained_model: "training/model-best"

# These are the directories that the project needs. The project CLI will make
# sure that they always exist.
directories: ["scripts", "configs", "assets", "data", "training"]

# Assets that should be downloaded or available in the directory. You can replace
# this with your own input data.
assets:
    - dest: ${vars.annotations}
      description: "Gold-standard REL annotations created with Prodigy"

workflows:
  all:
    - data
    - train_cpu
    - evaluate
  all_gpu:
    - data
    - train_gpu
    - evaluate

# Project commands, specified in a style similar to CI config files (e.g. Azure
# pipelines). The name is the command name that lets you trigger the command
# via "spacy project run [command] [path]". The help message is optional and
# shown when executing "spacy project run [optional command] [path] --help".
commands:
  - name: "data"
    help: "Parse the gold-standard annotations from the Prodigy annotations."
    script:
      - "python ./scripts/parse_data.py ${vars.annotations} ${vars.train_file} ${vars.dev_file} ${vars.test_file}"
    deps:
      - ${vars.annotations}
    outputs:
      - ${vars.train_file}
      - ${vars.dev_file}
      - ${vars.test_file}

  - name: "train_cpu"
    help: "Train the REL model on the CPU and evaluate on the dev corpus."
    script:
      - "python -m spacy train ${vars.tok2vec_config} --output training --paths.train ${vars.train_file} --paths.dev ${vars.dev_file} -c ./scripts/custom_functions.py"
    deps:
      - ${vars.train_file}
      - ${vars.dev_file}
    outputs:
      - ${vars.trained_model}

  - name: "train_gpu"
    help: "Train the REL model with a Transformer on a GPU and evaluate on the dev corpus."
    script:
      - "python -m spacy train ${vars.trf_config} --output training --paths.train ${vars.train_file} --paths.dev ${vars.dev_file} -c ./scripts/custom_functions.py --gpu-id 0"
    deps:
      - ${vars.train_file}
      - ${vars.dev_file}
    outputs:
      - ${vars.trained_model}

  - name: "evaluate"
    help: "Apply the best model to new, unseen text, and measure accuracy at different thresholds."
    script:
      - "python ./scripts/evaluate.py ${vars.trained_model} ${vars.test_file} False"
    deps:
      - ${vars.trained_model}
      - ${vars.test_file}


  - name: "clean"
    help: "Remove intermediate files to start data preparation and training from a clean slate."
    script:
      - "rm -rf data/*"
      - "rm -rf training/*"

rel_trf.config:

[paths]
train = null
dev = null
raw = null
init_tok2vec = null

[system]
seed = 342
gpu_allocator = "pytorch"

#"relation_extractor"
[nlp]
lang = "en"
pipeline = ["transformer","ner","relation_extractor"]
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
batch_size = 2

[components]

[components.transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v1"
name = "roberta-base"
tokenizer_config = {"use_fast": true}
# get_spans = 32

[components.transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 96

[components.ner]
factory = "ner"

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = false
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0

[components.ner.model.tok2vec.pooling]
@layers = "reduce_mean.v1"

[components.relation_extractor]
factory = "relation_extractor"
threshold = 0.5

[components.relation_extractor.model]
@architectures = "rel_model.v1"

[components.relation_extractor.model.create_instance_tensor]
@architectures = "rel_instance_tensor.v1"

[components.relation_extractor.model.create_instance_tensor.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0

[components.relation_extractor.model.create_instance_tensor.tok2vec.pooling]
@layers = "reduce_mean.v1"

[components.relation_extractor.model.create_instance_tensor.pooling]
@layers = "reduce_mean.v1"

[components.relation_extractor.model.create_instance_tensor.get_instances]
@misc = "rel_instance_generator.v1"
max_length = 30

[components.relation_extractor.model.classification_layer]
@architectures = "rel_classification_layer.v1"
nI = null
nO = null

[initialize]

[initialize.components]

[corpora]

[corpora.dev]
@readers = "Gold_ents_Corpus.v1"
file = ${paths.dev}
# # # max_length = 512
# [corpora.dev]
# @readers = "spacy.Corpus.v1"
# path = ${paths.dev}
# gold_preproc = True
# max_length = 512


[corpora.train]
@readers = "Gold_ents_Corpus.v1"
file = ${paths.train}
# # max_length = 512

# [corpora.train]
# @readers = "spacy.Corpus.v1"
# path = ${paths.train}
# max_length = 100

# [corpora.dev]
# @readers = "spacy.Corpus.v1"
# path = ${paths.dev}
# max_length = 0

[training]
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 15000
max_epochs = 0
max_steps = 30000
eval_frequency = 500
frozen_components = []
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
before_to_disk = null
logger = {"@loggers":"spacy.ConsoleLogger.v1"}

# [training.batcher]
# @batchers = "spacy.batch_by_padded.v1"
# discard_oversize = true
# size = 2000
# buffer = 256

[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 128
buffer = 256
# [training.batcher]
# @batchers = "spacy.batch_by_sequence.v1"
# size = 1
# get_length = null

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001

[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 5e-5

[training.score_weights]
rel_micro_p = 0.0
rel_micro_r = 0.0
rel_micro_f = 1.0

!spacy project run train_gpu
Training Logs:

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

================================= train_gpu =================================
Running command: /home/gsatis/anaconda3/envs/spacy/bin/python -m spacy train configs/rel_trf.cfg --output training --paths.train data/relations_final_train.spacy --paths.dev data/relations_final_test.spacy -c ./scripts/custom_functions.py --gpu-id 0
ℹ Saving to output directory: training
ℹ Using GPU: 0

=========================== Initializing pipeline ===========================
[2021-12-01 16:16:11,746] [INFO] Set up nlp object from config
[2021-12-01 16:16:11,753] [INFO] Pipeline: ['transformer', 'ner', 'relation_extractor']
[2021-12-01 16:16:11,756] [INFO] Created vocabulary
[2021-12-01 16:16:11,756] [INFO] Finished initializing nlp object
Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.layer_norm.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[2021-12-01 16:17:35,991] [INFO] Initialized pipeline components: ['transformer', 'ner', 'relation_extractor']
✔ Initialized pipeline

============================= Training pipeline =============================
ℹ Pipeline: ['transformer', 'ner', 'relation_extractor']
ℹ Initial learn rate: 0.0
E    #       LOSS TRANS...  LOSS NER  LOSS RELAT...  ENTS_F  ENTS_P  ENTS_R  REL_MICRO_P  REL_MICRO_R  REL_MICRO_F  SCORE 
---  ------  -------------  --------  -------------  ------  ------  ------  -----------  -----------  -----------  ------
  0       0         235.18    105.64           0.99    5.23    2.68  100.00         5.48        56.29         9.98    0.08
  0     500       10308.88   6149.33         103.52   82.02   69.53  100.00        64.32        36.71        46.74    0.64
  0    1000        2271.08   1702.01          34.17   51.46   34.64  100.00        74.39        61.62        67.40    0.59
  0    1500        2161.83   1581.01          28.40   42.76   27.20  100.00        76.67        63.23        69.30    0.56
  0    2000        2430.66   1541.53          23.70   54.97   37.90  100.00        78.39        56.62        65.75    0.60
  0    2500        1860.31   1387.54          21.73   44.00   28.20  100.00        77.33        74.82        76.06    0.60
  1    3000        1945.97   1472.32          22.14   67.84   51.34  100.00        77.19        75.58        76.38    0.72
  1    3500        1543.58   1323.80          20.12   53.85   36.85  100.00        77.94        65.23        71.02    0.62
  1    4000        2557.96   1407.44          18.35   70.20   54.09  100.00        76.52        79.37        77.92    0.74
  1    4500        1746.20   1224.78          19.23   85.42   74.55  100.00        78.28        62.58        69.56    0.77
  1    5000        1529.68   1155.15          18.73   49.47   32.86  100.00        76.49        67.41        71.66    0.61
  2    5500        2555.51   1420.78          18.90   83.62   71.85  100.00        77.67        56.13        65.17    0.74
  2    6000        2265.62   1282.41          17.78   79.41   65.85  100.00        77.31        59.24        67.08    0.73
  2    6500        1345.99   1132.35          17.40   76.13   61.46  100.00        77.21        77.12        77.17    0.77
  2    7000        1798.20   1272.33          18.58   44.91   28.96  100.00        77.85        60.69        68.21    0.57
  2    7500        1168.66   1153.99          18.40   67.21   50.61  100.00        75.77        54.40        63.33    0.65
  3    8000        1641.95   1173.90          16.71   73.53   58.14  100.00        76.69        68.15        72.17    0.73
  3    8500        1555.56   1110.63          15.57   61.36   44.26  100.00        76.04        58.55        66.16    0.64
  3    9000        1300.04   1086.59          18.46   70.49   54.43  100.00        77.46        62.38        69.10    0.70
  3    9500        1854.33   1228.87          16.43   72.04   56.30  100.00        77.01        54.07        63.53    0.68
  3   10000        2131.58   1190.27          15.81   63.19   46.19  100.00        77.15        80.22        78.66    0.71
  4   10500        1670.89   1148.77          15.89   60.70   43.57  100.00        78.62        75.68        77.12    0.69
  4   11000        1768.70   1130.24          13.48   78.40   64.47  100.00        79.63        69.39        74.16    0.76
  4   11500        1115.16    988.18          14.70   72.86   57.31  100.00        78.38        52.76        63.07    0.68
  4   12000        1568.25   1136.32          14.60   60.33   43.20  100.00        77.69        55.68        64.87    0.63
  4   12500        2213.73   1106.86          14.00   78.31   64.36  100.00        77.74        73.11        75.35    0.77
  5   13000        1424.97    979.67          13.91   72.73   57.15  100.00        77.86        81.96        79.85    0.76
  5   13500        2003.16   1054.92          14.05   54.73   37.67  100.00        78.32        72.13        75.10    0.65
  5   14000        1262.86    961.44          11.70   79.20   65.57  100.00        78.91        73.91        76.33    0.78
  5   14500        1807.67   1119.09          13.27   83.91   72.28  100.00        78.37        58.70        67.12    0.76
  5   15000        1397.31    985.93          12.32   77.24   62.91  100.00        78.96        71.51        75.05    0.76
  5   15500        1376.12   1046.07          12.32   73.59   58.21  100.00        78.48        70.64        74.36    0.74
  6   16000        2135.82   1121.11          12.56   81.04   68.12  100.00        78.89        65.69        71.69    0.76
  6   16500        1060.22    892.01          12.29   77.79   63.65  100.00        78.15        70.49        74.13    0.76
  6   17000        1416.61    919.05          12.66   83.95   72.34  100.00        78.58        64.84        71.06    0.78
  6   17500        1078.78    880.11          11.82   80.22   66.98  100.00        79.98        57.24        66.72    0.73
  6   18000        1683.96    914.93          12.06   80.05   66.73  100.00        78.96        67.78        72.94    0.76
  7   18500        1639.01    979.06          12.02   82.01   69.50  100.00        79.27        69.63        74.14    0.78
  7   19000        1351.51    863.62          10.91   80.28   67.06  100.00        79.39        69.35        74.03    0.77
  7   19500        1501.63    871.36          12.55   77.93   63.84  100.00        79.75        69.07        74.03    0.76
  7   20000        1677.56   1059.91          11.31   77.64   63.46  100.00        79.30        69.63        74.15    0.76
  7   20500        1133.85    870.40          10.43   77.64   63.46  100.00        79.30        69.63        74.15    0.76
  8   21000        1109.92    840.68          10.56   77.64   63.46  100.00        79.30        69.63        74.15    0.76
  8   21500        1663.15    984.93          10.87   77.64   63.46  100.00        79.30        69.63        74.15    0.76
  8   22000         974.09    801.69          11.24   77.64   63.46  100.00        79.30        69.63        74.15    0.76
  8   22500        1148.91    918.09          11.22   77.64   63.46  100.00        79.30        69.63        74.15    0.76
  8   23000        1528.46    873.24          11.55   77.64   63.46  100.00        79.30        69.63        74.15    0.76
  9   23500        2217.86   1019.36          10.84   77.64   63.46  100.00        79.30        69.63        74.15    0.76
  9   24000        1071.11    843.88          10.57   77.64   63.46  100.00        79.30        69.63        74.15    0.76
  9   24500        1579.17   1021.85          10.58   77.64   63.46  100.00        79.30        69.63        74.15    0.76
  9   25000        1298.09    932.66          11.37   77.64   63.46  100.00        79.30        69.63        74.15    0.76
  9   25500        1500.12    898.05          12.73   77.64   63.46  100.00        79.30        69.63        74.15    0.76
 10   26000        1225.75    854.57          11.02   77.64   63.46  100.00        79.30        69.63        74.15    0.76
 10   26500        1832.63    994.44          10.99   77.64   63.46  100.00        79.30        69.63        74.15    0.76
 10   27000        1599.48    985.63          11.24   77.64   63.46  100.00        79.30        69.63        74.15    0.76
 10   27500        1074.34    855.99          11.45   77.64   63.46  100.00        79.30        69.63        74.15    0.76
 10   28000        1317.55    927.14          11.21   77.64   63.46  100.00        79.30        69.63        74.15    0.76
 10   28500        1493.51    886.46          10.74   77.64   63.46  100.00        79.30        69.63        74.15    0.76
 11   29000        1347.32    889.57          10.99   77.64   63.46  100.00        79.30        69.63        74.15    0.76
 11   29500        1496.66    960.32          11.70   77.64   63.46  100.00        79.30        69.63        74.15    0.76
 11   30000        1508.91    884.49          10.66   77.64   63.46  100.00        79.30        69.63        74.15    0.76
✔ Saved pipeline to output directory
training/model-last

!spacy project run evaluate

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

================================== evaluate ==================================
Running command: /home/gsatis/anaconda3/envs/spacy/bin/python ./scripts/evaluate.py training/model-best data/relations_final_test.spacy False

Random baseline:
threshold 0.00 	 {'rel_micro_p': '5.07', 'rel_micro_r': '100.00', 'rel_micro_f': '9.66'}
threshold 0.05 	 {'rel_micro_p': '5.07', 'rel_micro_r': '94.77', 'rel_micro_f': '9.62'}
threshold 0.10 	 {'rel_micro_p': '5.07', 'rel_micro_r': '89.73', 'rel_micro_f': '9.60'}
threshold 0.20 	 {'rel_micro_p': '5.12', 'rel_micro_r': '80.65', 'rel_micro_f': '9.63'}
threshold 0.30 	 {'rel_micro_p': '5.10', 'rel_micro_r': '70.45', 'rel_micro_f': '9.52'}
threshold 0.40 	 {'rel_micro_p': '5.22', 'rel_micro_r': '61.72', 'rel_micro_f': '9.63'}
threshold 0.50 	 {'rel_micro_p': '5.21', 'rel_micro_r': '51.48', 'rel_micro_f': '9.46'}
threshold 0.60 	 {'rel_micro_p': '5.25', 'rel_micro_r': '41.52', 'rel_micro_f': '9.32'}
threshold 0.70 	 {'rel_micro_p': '5.14', 'rel_micro_r': '30.59', 'rel_micro_f': '8.80'}
threshold 0.80 	 {'rel_micro_p': '5.42', 'rel_micro_r': '21.35', 'rel_micro_f': '8.65'}
threshold 0.90 	 {'rel_micro_p': '5.49', 'rel_micro_r': '10.70', 'rel_micro_f': '7.25'}
threshold 0.99 	 {'rel_micro_p': '6.29', 'rel_micro_r': '1.19', 'rel_micro_f': '2.01'}
threshold 1.00 	 {'rel_micro_p': '5.45', 'rel_micro_r': '0.12', 'rel_micro_f': '0.23'}

Results of the trained model:
threshold 0.00 	 {'rel_micro_p': '5.08', 'rel_micro_r': '100.00', 'rel_micro_f': '9.66'}
threshold 0.05 	 {'rel_micro_p': '64.12', 'rel_micro_r': '94.30', 'rel_micro_f': '76.34'}
threshold 0.10 	 {'rel_micro_p': '72.77', 'rel_micro_r': '93.25', 'rel_micro_f': '81.74'}
threshold 0.20 	 {'rel_micro_p': '78.31', 'rel_micro_r': '91.71', 'rel_micro_f': '84.48'}
threshold 0.30 	 {'rel_micro_p': '80.33', 'rel_micro_r': '90.54', 'rel_micro_f': '85.13'}
threshold 0.40 	 {'rel_micro_p': '81.47', 'rel_micro_r': '89.76', 'rel_micro_f': '85.41'}
threshold 0.50 	 {'rel_micro_p': '82.13', 'rel_micro_r': '88.14', 'rel_micro_f': '85.03'}
threshold 0.60 	 {'rel_micro_p': '82.73', 'rel_micro_r': '86.08', 'rel_micro_f': '84.37'}
threshold 0.70 	 {'rel_micro_p': '83.21', 'rel_micro_r': '83.49', 'rel_micro_f': '83.35'}
threshold 0.80 	 {'rel_micro_p': '83.88', 'rel_micro_r': '79.29', 'rel_micro_f': '81.52'}
threshold 0.90 	 {'rel_micro_p': '85.05', 'rel_micro_r': '71.74', 'rel_micro_f': '77.83'}
threshold 0.99 	 {'rel_micro_p': '86.49', 'rel_micro_r': '29.31', 'rel_micro_f': '43.78'}
threshold 1.00 	 {'rel_micro_p': '80.15', 'rel_micro_r': '3.94', 'rel_micro_f': '7.51'}

Let me know if any other details are required.

Thanks

svlandeg Dec 15, 2021

I'm pretty sure the difference is caused by whether or not you're using gold and/or predicted entities when evaluating the REL component. The original project didn't include an NER model and used gold entities everywhere. You can control this behaviour at two places in the code:

the reader in custom_functions.py: Gold_ents_Corpus.v1 includes the line

pred.ents = gold.ents

right before yielding the example, and thus uses gold entities only.

the main script in evaluate.py contains:

pred.ents = gold.ents
for name, proc in nlp.pipeline:
    pred = proc(pred)

Because your pipeline has an NER component, this will make additional predictions on top of the gold ones. You can change the behaviour however you like. If you switch the order of the statements, you'll have gold entities only. Alternatively, you can also remove the first assignment entirely to only have predicted annotations.

karndeepsingh Dec 15, 2021
Author

@svlandeg Thanks for answering. So, you mean to get my predicted annotation, I need to comment or remove the line

pred.ents = gold.ents in both custom_function.py or evaluate.py.

Correct me if I am wrong.

Thanks

svlandeg · 2021-12-09T12:10:51Z

svlandeg
Dec 9, 2021

Secondly, How shall I remove False Positive?

I think this question has been asked and answered to you several times before. In short: either improve the training dataset, or write a custom rule-based component to remove predictions that are non-sensical.

1 reply

karndeepsingh Dec 9, 2021
Author

@svlandeg Can you please help me, how can I write a custom rule for the above example mentioned? Or Recommend any discussion thread where examples are mentioned to write custom rules to remove false positives?

Thanks

Uh oh!

Difference in evaluation metric while evaluating NER + RELATIONAL model #9808

Uh oh!

Uh oh!

karndeepsingh Dec 6, 2021

Replies: 2 comments · 10 replies

Uh oh!

Uh oh!

svlandeg Dec 9, 2021

Uh oh!

Uh oh!

karndeepsingh Dec 9, 2021 Author

Uh oh!

Uh oh!

svlandeg Dec 9, 2021

Uh oh!

Uh oh!

karndeepsingh Dec 9, 2021 Author

Uh oh!

svlandeg Dec 15, 2021

Uh oh!

karndeepsingh Dec 15, 2021 Author

Uh oh!

svlandeg Dec 9, 2021

Uh oh!

karndeepsingh Dec 9, 2021 Author

karndeepsingh
Dec 6, 2021

Replies: 2 comments 10 replies

svlandeg
Dec 9, 2021

karndeepsingh Dec 9, 2021
Author

karndeepsingh Dec 9, 2021
Author

karndeepsingh Dec 15, 2021
Author

svlandeg
Dec 9, 2021

karndeepsingh Dec 9, 2021
Author