Skip to content

[Bee] code formula predictor does not work with transformers 4.56.0 or above #145

@tozhangkai

Description

@tozhangkai

When I install transformers 4.56.0 or newer, the code formula predictor breaks down and produce garbage output. The problem can be reproduced by the test case https://github.com/docling-project/docling-ibm-models/blob/main/tests/test_code_formula_predictor.py

Reproduce steps:

  1. Clone the docling-ibm-models repo and switch to the latest release tag (I use 3.11.0)
  2. Create the virtual envionment and install the dependencies. Check the transformer version. At this point it installs 4.57.6. I also tested the older 4.56.0 and it has the same problem.
  3. Run the test case pytest -s tests/test_code_formula_predictor.py. The test is very slow, takes close to 10 minutes to process the input image, and then fails with the output as below:
# pytest tests/test_code_formula_predictor.py
===================================================================================== test session starts ======================================================================================
platform linux -- Python 3.11.13, pytest-9.0.2, pluggy-1.6.0
rootdir: /root/git/docling-ibm-models
configfile: pyproject.toml
collected 1 item                                                                                                                                                                               

tests/test_code_formula_predictor.py F                                                                                                                                                   [100%]

=========================================================================================== FAILURES ===========================================================================================
_________________________________________________________________________________ test_code_formula_predictor __________________________________________________________________________________

init = {'artifact_path': '/root/.cache/huggingface/hub/models--ds4sd--CodeFormula/snapshots/d1edced1e8248a343c2d450cd36115e3b...s/test_data/code_formula/gt/code.txt', 'image_path': 'tests/test_data/code_formula/images/code.png', 'label': 'code'}]}

    def test_code_formula_predictor(init: dict):
        r"""
        Unit test for the CodeFormulaPredictor
        """
...
            with Image.open(img_path) as img, open(gt_path, 'r') as gt_fp:
                gt = gt_fp.read()
    
                output = code_formula_predictor.predict([img], [label], temperature)
                output = output[0]
    
>               assert output == gt
E               AssertionError: assert '}\n }\n} }\n...\n\n\n\n\n\n,' == '<_C++_> #inc... return 0;\n}'
E                 
E                 - <_C++_> #include<iostream>
E                 - using namespace std;
E                 + }
E                 +  }
E                 + } }
E                 + } }...
E                 
E                 ...Full output truncated (3621 lines hidden), use '-vv' to show

tests/test_code_formula_predictor.py:126: AssertionError
------------------------------------------------------------------------------------ Captured stderr setup -------------------------------------------------------------------------------------
Fetching 12 files: 100%|██████████| 12/12 [00:00<00:00, 43842.90it/s]
=================================================================================== short test summary info ====================================================================================
FAILED tests/test_code_formula_predictor.py::test_code_formula_predictor - AssertionError: assert '}\n }\n} }\n...\n\n\n\n\n\n,' == '<_C++_> #inc... return 0;\n}'
================================================================================ 1 failed in 598.95s (0:09:58) =================================================================================

I managed to dump the actual output from the output variable. It's supposed to have the OCR'ed text from the code snippet in the input image matching the ground truth, but it's completely messed up:

}
 }
} }
} }

 }
 }
 }
 }
 }

 }
 }
 }

 }
 }
 }
 }


 } }
....

The full output: output.txt

If I downgrade transformers to 4.55.4, the test passes and runs much faster (it takes about 2.5 minutes to process the code image). But this transformers version has a severe CVE thus we can't use this old version. The function is broken for us.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions