Releases: Future-House/paper-qa
Releases · Future-House/paper-qa
v2026.03.18
What's Changed
- Pulled in
pyzoterotyping fix by @jamesbraza in #1319 - Loosen typing of actions to Message by @sidnarayanan in #1318
Full Changelog: v2026.03.12...v2026.03.18
v2026.03.12
What's Changed
- chore(deps): lock file maintenance by @renovate[bot] in #1315
- chore(deps): update actions/download-artifact action to v8 by @renovate[bot] in #1314
- Pulling in
docling-core,docling,pymupdffixes by @jamesbraza in #1313 - Add "et al." to invalid citation examples in prompt by @jamesbraza in #1316
Full Changelog: v2026.03.03...v2026.03.12
v2026.03.03
What's Changed
- Upstreamed
npmdeps caching by @jamesbraza in #1308 - Fixing JSON schema export of
Settingsby @jamesbraza in #1309 - Fixing flaky PaSa figure 1 read assertions by @jamesbraza in #1312
- Fixed CMYK images crashing PNG encoding in PyPDF reader by @jamesbraza in #1311
Full Changelog: v2026.02.27...v2026.03.03
v2026.02.27
What's Changed
- Downpinning
docling-parse,PyMuPDF,pyzotero,litellmbugs by @jamesbraza in #1295 - chore(deps): lock file maintenance by @renovate[bot] in #1292
- Respecting custom index settings name by @jamesbraza in #1296
- Adding prompt caching by @jamesbraza in #1293
- Fixing Semantic Scholar crash on
maxover empty list by @jamesbraza in #1297 - Fixed PyPDF reader's failure to handle non-PNG data without
pdfplumberby @jamesbraza in #1298 - Handling LDP
Agentincorrectly givingMessageby @jamesbraza in #1299 - Caching
npmdeps in lint CI by @jamesbraza in #1304 - chore(deps): lock file maintenance by @renovate[bot] in #1303
- Re-loosened
test_timeout_resilienceafter LiteLLM fix by @jamesbraza in #1305 - Disabling cache reads assertion for Google Gemini by @jamesbraza in #1306
- Multiprocessing support + optimization for
nemotron-parsereader by @jamesbraza in #1300 - Fix acronym-led citation docname ingest failures by @AmT42 in #1302
- Supporting signed GCS links in
ParsedMediaby @jamesbraza in #1307
New Contributors
Full Changelog: v2026.02.16...v2026.02.27
v2026.02.16
What's Changed
- Added
delto save memory during reading by @jamesbraza in #1265 - More
nemotron-parselinks inREADME.mdby @jamesbraza in #1267 - chore(deps): lock file maintenance by @renovate[bot] in #1268
- Validating
ParsedMedia.datais not empty by @jamesbraza in #1269 - Retrying
nemotron-parseAPI calls receiving 408 timeouts by @jamesbraza in #1276 - Fixing 60-sec wait for retrying
nemotron-parseAPI calls' 408 timeouts by @jamesbraza in #1277 - Updating
journal_quality.csvfrom script by @jamesbraza in #1278 - Logging tool name in
NemotronBBoxError/NemotronLengthErrorby @jamesbraza in #1279 - Non-destructive retrying on
nemotron-parseAPI by @jamesbraza in #1281 - Per-page failover parser for
nemotron-parseby @jamesbraza in #1280 - Fixing newly-added journal quality
4causingKeyErrorby @jamesbraza in #1282 - Pulled in
UV_VENV_CLEARforuv==0.10.0break by @jamesbraza in #1285 - Fixing
test_parse_office_docby modernizing Gemini model by @jamesbraza in #1286 - Multiprocessing support for PyMuPDF
full_pagemode by @jamesbraza in #1284 - Fixing failing
test_equations[docling]by caching Docling models beforepytestby @jamesbraza in #1287 - Fixing
dockey/doc_idmismatch when no metadata is found by @jamesbraza in #1288 - chore(deps): update suzuki-shunsuke/github-action-renovate-config-validator action to v2 by @renovate[bot] in #1291
- chore(deps): update pre-commit hook psf/black-pre-commit-mirror to v26 by @renovate[bot] in #1290
Full Changelog: v2026.01.05...v2026.02.16
v2026.01.05
What's Changed
- Fixing crash from bad PDF read with PyMuPDF by @jamesbraza in #1259
- Fixing crash from bad PDF read with PyPDF by @jamesbraza in #1260
- chore(deps): lock file maintenance by @renovate[bot] in #1263
- Not having ID break index cache key across Python invocations by @jamesbraza in #1262
- Refreshing
test_equations[docling]VCR cassette by @jamesbraza in #1264
Full Changelog: v2025.12.23...v2026.01.05
v2025.12.23
What's Changed
- Unsilenced flaky
test_duplicate_media_context_creationby @jamesbraza in #1252 - Modernizing bundled configs by @jamesbraza in #1251
- Moving
nemotron-parseto failover to reinventmarkdown_bboxonlengtherror by @jamesbraza in #1256 - Refreshing settings table and tutorial by @jamesbraza in #1255
Full Changelog: v2025.12.19...v2025.12.23
v2025.12.19
What's Changed
- Disabling parallel CI jobs' fast fail by @jamesbraza in #1248
- Adding some
pytest.mark.flakyby @jamesbraza in #1249 - Added IoU-based merging to
nemtron-parseby @jamesbraza in #1246 - Fixed bug in PyPDF reader where one can't avoid
pdfplumberby @jamesbraza in #1247
Full Changelog: v2025.12.17...v2025.12.19
v2025.12.17
Summary
The last four months since version 5.29.1 have seen many changes:
- New modalities: tables, figures, non-English languages, math equations
- More and better readers
- Two new model-based PDF readers: Docling and Nvidia nemotron-parse
- All PDF readers now can parse images and tables, report page numbers, support DPI
- A reader for Microsoft Office data types
- Multimodal contextual summarization
- Media objects are also passed to the
summary_llmduring creation - Media objects' embedding space is enhanced using an
enrichment_llmprompt
- Media objects are also passed to the
- Simpler and performant HTTP stack
- Consolidation from
aiohttpandhttpxto justhttpx - Integration with
httpx-aiohttpfor performance
- Consolidation from
Contextrelevance is simplified and some assumptions were removed- Many minor features such as retrying
Contextcreation upon invalid JSON, compatibility with fall 2025's frontier LLMs, and improved prompt templates - Multiple fixes in metadata processing via Semantic Scholar and OpenAlex, and metadata processing (e.g. incorrectly inferring identical document IDs for main text and SI)
- Completed the deprecations accrued over the past year
What's Changed
- Fixing
gen_answerfailover leavingraw_answerblank by @jamesbraza in #1077 - Image reader and image support in
gather_evidenceby @jamesbraza in #1046 - Multimodal PDF support by @jamesbraza in #1047
- Documenting
DocMetadataTask/MetadataProviderby @jamesbraza in #1050 - chore(deps): lock file maintenance by @renovate[bot] in #1075
- Fixed LDP erroneous v33 tag by @jamesbraza in #1076
- Consolidating on
httpxby @jamesbraza in #1062 - S2 metadata retrieval: fallback to exact title match if authors do not match by @sidnarayanan in #1078
- chore(deps): update pre-commit hook pre-commit/pre-commit-hooks to v6 by @renovate[bot] in #1080
- chore(deps): update actions/setup-python action to v6 by @renovate[bot] in #1081
- Retrying
Contextcreation once by @jamesbraza in #1083 - Expanding
llm_parse_jsonand removingextract_scoreassumptions by @jamesbraza in #1082 - Fixing flaky
test_title_searchvia client ordering by @jamesbraza in #1084 - Newer tooling and deps by @jamesbraza in #1085
- Fixed type hint/docstring on
DocMetadataClient.metadata_clientsby @jamesbraza in #1086 - chore(deps): lock file maintenance by @renovate[bot] in #1087
- Filtering out null byte from table
ParsedMedia.textby @jamesbraza in #1088 - Context id updates by @mskarlin in #1089
- Removed other from doc details for user by @whitead in #1092
- Removing inplace modification by @whitead in #1093
- Enforcing
DocDetails.tzinfoto be UTC by @jamesbraza in #1094 - Migrating
httpx.AsyncClienttohttpx_aiohttp.HttpxAiohttpClientby @jamesbraza in #1099 - Adopting
prekoverpre-commit,setup-uv'spython-versionby @jamesbraza in #1098 - Gather evidence fails gracefully on timeout by @sidnarayanan in #1103
- Upstreaming prompt change by @whitead in #1100
gather_evidencetool response filtering irrelevantContext, with updated threshold of0by @jamesbraza in #1106- Fixing
used_imagesusage insummary_json_system_promptby @jamesbraza in #1109 - Removed
actions/setup-pythonfrompre-commitby @jamesbraza in #1102 - Newer
pre-commitdeps, to testpre-commitCI works by @jamesbraza in #1110 - Dropping irrelevant contexts by @jamesbraza in #1113
- Catch safety refusals and other BadRequestErrors in gather evidence by @sidnarayanan in #1114
- Exposing
DOC_DETAILS_OTHERS_TO_KEEPfor subclassers by @jamesbraza in #1115 - Re-adding
refurbby @jamesbraza in #1117 - Adding missing
py.typedto reader packages by @jamesbraza in #1118 - Fixing
uvcache not existing by @jamesbraza in #1120 - chore(deps): lock file maintenance by @renovate[bot] in #1119
- Fixing
KeyErrorcrash when PDF reader misses a page by @jamesbraza in #1122 - Fixing
Settings.get_index_namebeing the same for different PDF parsers by @jamesbraza in #1125 - Ensuring 404 PDF is not parsed into texts by @jamesbraza in #1126
- Updating LLM descriptions in
Settingsby @jamesbraza in #1130 - Cleaning up
tiktokenusage in readers by @jamesbraza in #1131 - Pulling in BAIPP's fix that ignores
UV_PYTHONby @jamesbraza in #1133 - chore(deps): update astral-sh/setup-uv action to v7 by @renovate[bot] in #1134
- chore(deps): lock file maintenance by @renovate[bot] in #1135
- Refactored
ParsedMetadata/ChunkMetadatato reflect all options by @jamesbraza in #1132 - Pulling in LMI's image utils by @jamesbraza in #1129
- Created
doclingPDF reader by @jamesbraza in #1121 - chore(deps): update astral-sh/setup-uv action to v7 by @renovate[bot] in #1136
- Pinning
fhlmimin version by @jamesbraza in #1137 - Ensuring citation peeks aren't multimodal by @jamesbraza in #1138
- Not inferring title/DOI every
docs_fixtureconstruction by @jamesbraza in #1139 - Making search tools concurrent by @sidnarayanan in #1141
- Deprecating
Docs.delete'snamearg by @jamesbraza in #1142 - Fixing flaky Docling tests due to filesystem race conditions by @jamesbraza in #1144
- Removed unnecessary
fhlmipinning by @jamesbraza in #1146 - Expanding
temperatureautoset to 1 for GPT-5 by @jamesbraza in #1145 - chore(deps): lock file maintenance by @renovate[bot] in #1148
- Setting
DOCLING_ARTIFACTS_PATHto hardcode RapidOCR downloads by @jamesbraza in #1147 - Fixing
jpghandling inParsedMedia.to_image_urlby @jamesbraza in #1150 - Generalizing tests for smarter LLMs by @jamesbraza in #1149
- Supporting page range in readers for performant PDF peeking by @jamesbraza in #1152
- chore(deps): update actions/download-artifact action to v6 by @renovate[bot] in #1155
- Generalizing tests for smarter LLMs 2 by @jamesbraza in #1154
- Deduplicating media on
Contextcreation by @jamesbraza in #1153 - chore(deps): lock file maintenance by @renovate[bot] in #1157
- Recording
page_numin media metadata by @jamesbraza in #1158 - Better crash on
Nonevalue forfields_to_overwrite_from_metadataby @jamesbraza in #1159 - Simplifying prompt newlines to use
---hrule and removed extra hrules by @jamesbraza in #1156 - Supporting media enrichment in embeddings by @jamesbraza in #1143
- Exposed
reader_configsetting to set DPI and simplify settings by @jamesbraza in #1160 - Removing a few
chunk_size/overlapreferences missed in #1160 by @jamesbraza in #1161 - 'Full page screenshot' enrichment prompt template by @jamesbraza in #1162
- Allowing
stryear values inpaper_searchby @jamesbraza...
v5.29.1
What's Changed
- Fixing
gen_answerfailover leavingraw_answerblank by @jamesbraza in #1077
Note @jamesbraza did a clean rebase of a few commits to beyond this release, so this CHANGELOG entry was manually constructed.
Full Changelog: v5.29.0...v5.29.1