[Bee] code formula predictor does not work with transformers 4.56.0 or above

When I install transformers 4.56.0 or newer, the code formula predictor breaks down and produce garbage output. The problem can be reproduced by the test case https://github.com/docling-project/docling-ibm-models/blob/main/tests/test_code_formula_predictor.py

Reproduce steps:

1. Clone the [docling-ibm-models](https://github.com/docling-project/docling-ibm-models) repo and switch to the latest release tag (I use 3.11.0)
2. Create the virtual envionment and install the dependencies. Check the transformer version. At this point it installs `4.57.6`. I also tested the older `4.56.0` and it has the same problem.
3. Run the test case `pytest -s tests/test_code_formula_predictor.py`. The test is very slow, takes close to 10 minutes to process the [input image](https://github.com/docling-project/docling-ibm-models/blob/main/tests/test_data/code_formula/images/code.png), and then fails with the output as below:

```
# pytest tests/test_code_formula_predictor.py
===================================================================================== test session starts ======================================================================================
platform linux -- Python 3.11.13, pytest-9.0.2, pluggy-1.6.0
rootdir: /root/git/docling-ibm-models
configfile: pyproject.toml
collected 1 item                                                                                                                                                                               

tests/test_code_formula_predictor.py F                                                                                                                                                   [100%]

=========================================================================================== FAILURES ===========================================================================================
_________________________________________________________________________________ test_code_formula_predictor __________________________________________________________________________________

init = {'artifact_path': '/root/.cache/huggingface/hub/models--ds4sd--CodeFormula/snapshots/d1edced1e8248a343c2d450cd36115e3b...s/test_data/code_formula/gt/code.txt', 'image_path': 'tests/test_data/code_formula/images/code.png', 'label': 'code'}]}

    def test_code_formula_predictor(init: dict):
        r"""
        Unit test for the CodeFormulaPredictor
        """
...
            with Image.open(img_path) as img, open(gt_path, 'r') as gt_fp:
                gt = gt_fp.read()
    
                output = code_formula_predictor.predict([img], [label], temperature)
                output = output[0]
    
>               assert output == gt
E               AssertionError: assert '}\n }\n} }\n...\n\n\n\n\n\n,' == '<_C++_> #inc... return 0;\n}'
E                 
E                 - <_C++_> #include<iostream>
E                 - using namespace std;
E                 + }
E                 +  }
E                 + } }
E                 + } }...
E                 
E                 ...Full output truncated (3621 lines hidden), use '-vv' to show

tests/test_code_formula_predictor.py:126: AssertionError
------------------------------------------------------------------------------------ Captured stderr setup -------------------------------------------------------------------------------------
Fetching 12 files: 100%|██████████| 12/12 [00:00<00:00, 43842.90it/s]
=================================================================================== short test summary info ====================================================================================
FAILED tests/test_code_formula_predictor.py::test_code_formula_predictor - AssertionError: assert '}\n }\n} }\n...\n\n\n\n\n\n,' == '<_C++_> #inc... return 0;\n}'
================================================================================ 1 failed in 598.95s (0:09:58) =================================================================================
```

I managed to dump the actual output from the `output` variable. It's supposed to have the OCR'ed text from the code snippet in the [input image](https://github.com/docling-project/docling-ibm-models/blob/main/tests/test_data/code_formula/images/code.png) matching the [ground truth](https://github.com/docling-project/docling-ibm-models/blob/main/tests/test_data/code_formula/gt/code.txt), but it's completely messed up:

```
}
 }
} }
} }

 }
 }
 }
 }
 }

 }
 }
 }

 }
 }
 }
 }


 } }
....
```

The full output: [output.txt](https://github.com/user-attachments/files/24976108/output.txt)

If I downgrade transformers to 4.55.4, the test passes and runs much faster (it takes about 2.5 minutes to process the code image). But this transformers version has a severe CVE thus we can't use this old version. The function is broken for us.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bee] code formula predictor does not work with transformers 4.56.0 or above #145

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bee] code formula predictor does not work with transformers 4.56.0 or above #145

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions