Filters like NoDuplicates and, more prominently, MergeRedundant, break the link between Paired VH/VL
How to deal with this? => Probably a many to many correspondence between "index_in_csv" of VHs and VLs needs to be kept around.
Not important unless we start training against paired VH & VL sequences (e.g., add a self-supervised goal to guess if a pair is real or not alongside training with both chains "concatenated")