Skip to content

Load Biobert pre-trained weights into Bert model with Pytorch bert hugging face run_classifier.py code #6

@sheetalsh456

Description

@sheetalsh456

These are the steps I followed to get Biobert working with the existing Bert hugging face pytorch code.

  1. I downloaded the pre-trained weights 'biobert_pubmed_pmc.tar.gz' from the Releases page.

  2. I ran this command to convert the tf checkpoint to pytorch model

python pytorch-pretrained-BERT/pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py --tf_checkpoint_path="biobert/pubmed_pmc_470k/biobert_model.ckpt.index" --bert_config_file="biobert/pubmed_pmc_470k/bert_config.json" --pytorch_dump_path="biobert/pubmed_pmc_470k/Pytorch/biobert.model"

This created a file 'biobert.model' in the specified path.

  1. As mentioned in this link , I compressed 'biobert.model' created above and 'biobert/pubmed_pmc_470k/bert_config.json' together into a biobert_model.tar.gz

  2. I then ran the run_classifier.py of hugging face bert with the following command, using the tar.gz created above.

python pytorch-pretrained-BERT/examples/run_classifier.py --data_dir="Data/" --bert_model="biobert_model.tar.gz" --task_name="qqp" --output_dir="OutputModels/Pretrained/" --do_train --do_eval --do_lower_case

I get the error

'UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte' 

in the line

tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case)

Am I doing something wrong?

I just wanted to run run_classifier.py code provided by hugging face with biobert pretrained weights in the same way that we run bert with it. Is there a way to do this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions