How to finetune with hundreds of labels?

I want to finetune the model using a dataset with over 1000 categories to  recognize fine-grained classes. However, when I include all the categories in the label_list, it exceeds the maximum token length of 512 during BERT processing. 
How should I address this problem? Should I use a larger BERT model, or are there other methods to support my needs? Thank you for your assistance!