Include 4 extractors and bench for 545 data
What's Changed
- fix bug:table 重复 by @pekopoke in #42
- Optimize table edit distance calculation by using normalize by @pekopoke in #43
- add extractor version in results by @pekopoke in #44
- fix back to old formula match by @pekopoke in #45
- feat: add language and style classify by @e06084 in #46
- 使用LLM修正预测公式 by @1041206149 in #47
- feat: refactor _extract_from_markdown with LLM-enhanced table/formula/code extraction by @1041206149 in #48
- Dev:增加trafilatura输出txt的方法 by @pekopoke in #50
- 将LLM api 配置放到config.py中 by @1041206149 in #51
- fix:行内行间代码块中不进行表格和公式提取 by @pekopoke in #52
New Contributors
- @1041206149 made their first contribution in #47
Full Changelog: v0.2.0...v1.0.0