GitHub - mormor-karl/the-devils-in-the-details: The Devil's in the Details: the Detailedness of Classes Influences Personal Information Detection and Labeling - NoDaLiDa/Baltic-HLT 2025

This is the repository for "The Devil's in the Details: the Detailedness of Classes Influences Personal Information Detection and Identification", a NoDaLiDa/Baltic-HLT 2025 submission. Since we cannot share the data due to privacy concerns, we only share the code (see details in the submission).

Preparations

The token classification with Transformers is based on this example. Download this code into a folder called bert within the experiments folder.

Previous code

This indicates which code was taken from a previous paper working with the same data.

bert/run_iob.sh
data/preproc.sh
analyze_output.py - with our changes
reannotate_iob.py - with our (major) changes

Other code

analyze output.sh is used to bulk calculate evaluation measures.
compare_predictions.py is used to combine the predictions of all models into one for future analyses.
generate_splits.sh is used to reannotate and split the data for fine-tuning.
visualize.py is responsible for generating plots.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
experiments		experiments
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Preparations

Previous code

Other code

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Preparations

Previous code

Other code

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages