This is the repository for "The Devil's in the Details: the Detailedness of Classes Influences Personal Information Detection and Identification", a NoDaLiDa/Baltic-HLT 2025 submission. Since we cannot share the data due to privacy concerns, we only share the code (see details in the submission).
The token classification with Transformers is based on this example. Download this code into a folder called bert within the experiments folder.
This indicates which code was taken from a previous paper working with the same data.
bert/run_iob.shdata/preproc.shanalyze_output.py- with our changesreannotate_iob.py- with our (major) changes
analyze output.shis used to bulk calculate evaluation measures.compare_predictions.pyis used to combine the predictions of all models into one for future analyses.generate_splits.shis used to reannotate and split the data for fine-tuning.visualize.pyis responsible for generating plots.