For the X -> EN task, I noticed there are some blank / empty translations even though the source language text has a valid sentence.
Here are the number of blank translations per language. The problem is mainly on a few of the test sets. How does the bleu score computation handle this?
- el train 0
- el valid 0
- el test 3
- es train 0
- es valid 0
- es test 11
- fr train 0
- fr valid 0
- fr test 1
- it train 0
- it valid 0
- it test 0
- pt train 0
- pt valid 0
- pt test 1
- ru train 0
- ru valid 0
- ru test 8