This repository contains the latest code for the Extended Edit Distance (EED) metric for machine translation. The metric utilises the Levenshtein distance and extends it by adding an additional jump operation.
The metric is implemented both in C++ and Python. The C++ implementation can be found in EED.cpp, where as the Python implementation in EED.py. EED.py also provides the entry point for the metric. It calls the compiled C++ variant of the metric libEED.so.
- Python 3.5 with the packages
ctypesandcodecs - C++ 11
Use:
python3 EED.py -ref [reference] -hyp [hypothesis] [optional: -v]
- The optional
-varguments toggles the verbosity of the output. If present the score for each segment will be shown. - It is assumed that each segment is on a separate line.
- The final system/file score is computed by taking the average of all segment scores.
If changes to the metric are desired. The C++ code in EED.cpp has to be altered and recompiled. Recompilation can be done via:
g++ EED.cpp -shared -o libEED.so -std=c++11 -fPIC -O3 -flto -funroll-loops -frename-registers
Further optimisation can be done by using -fprofile-generate and -fprofile-use as compiler options.