Hi,
In the evaluation script (score.py), precisely here:
https://github.com/thompsonb/vecalign/blob/ca96a30716f12241e14f836b06705107c771987c/score.py#L57C5-L57C5
I've noticed that you cycle in the for loop based on the variable "testalign", which should contain the alignment generated by the algorithm. The problem is that if the algorithm does not align a source sentence, this is not counted as an error.
For example, if you call _precision(testalign=goldalign[:2], goldalign=goldalign), the resulting f1 is 1 even though you predicted only two alignments out of all the possible ones.