-
Notifications
You must be signed in to change notification settings - Fork 57
Open
Description
When I install transformers 4.56.0 or newer, the code formula predictor breaks down and produce garbage output. The problem can be reproduced by the test case https://github.com/docling-project/docling-ibm-models/blob/main/tests/test_code_formula_predictor.py
Reproduce steps:
- Clone the docling-ibm-models repo and switch to the latest release tag (I use 3.11.0)
- Create the virtual envionment and install the dependencies. Check the transformer version. At this point it installs
4.57.6. I also tested the older4.56.0and it has the same problem. - Run the test case
pytest -s tests/test_code_formula_predictor.py. The test is very slow, takes close to 10 minutes to process the input image, and then fails with the output as below:
# pytest tests/test_code_formula_predictor.py
===================================================================================== test session starts ======================================================================================
platform linux -- Python 3.11.13, pytest-9.0.2, pluggy-1.6.0
rootdir: /root/git/docling-ibm-models
configfile: pyproject.toml
collected 1 item
tests/test_code_formula_predictor.py F [100%]
=========================================================================================== FAILURES ===========================================================================================
_________________________________________________________________________________ test_code_formula_predictor __________________________________________________________________________________
init = {'artifact_path': '/root/.cache/huggingface/hub/models--ds4sd--CodeFormula/snapshots/d1edced1e8248a343c2d450cd36115e3b...s/test_data/code_formula/gt/code.txt', 'image_path': 'tests/test_data/code_formula/images/code.png', 'label': 'code'}]}
def test_code_formula_predictor(init: dict):
r"""
Unit test for the CodeFormulaPredictor
"""
...
with Image.open(img_path) as img, open(gt_path, 'r') as gt_fp:
gt = gt_fp.read()
output = code_formula_predictor.predict([img], [label], temperature)
output = output[0]
> assert output == gt
E AssertionError: assert '}\n }\n} }\n...\n\n\n\n\n\n,' == '<_C++_> #inc... return 0;\n}'
E
E - <_C++_> #include<iostream>
E - using namespace std;
E + }
E + }
E + } }
E + } }...
E
E ...Full output truncated (3621 lines hidden), use '-vv' to show
tests/test_code_formula_predictor.py:126: AssertionError
------------------------------------------------------------------------------------ Captured stderr setup -------------------------------------------------------------------------------------
Fetching 12 files: 100%|██████████| 12/12 [00:00<00:00, 43842.90it/s]
=================================================================================== short test summary info ====================================================================================
FAILED tests/test_code_formula_predictor.py::test_code_formula_predictor - AssertionError: assert '}\n }\n} }\n...\n\n\n\n\n\n,' == '<_C++_> #inc... return 0;\n}'
================================================================================ 1 failed in 598.95s (0:09:58) =================================================================================
I managed to dump the actual output from the output variable. It's supposed to have the OCR'ed text from the code snippet in the input image matching the ground truth, but it's completely messed up:
}
}
} }
} }
}
}
}
}
}
}
}
}
}
}
}
}
} }
....
The full output: output.txt
If I downgrade transformers to 4.55.4, the test passes and runs much faster (it takes about 2.5 minutes to process the code image). But this transformers version has a severe CVE thus we can't use this old version. The function is broken for us.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels