Skip to content

error(R-script): bioRxiv funding parse error and medRxiv download interruptions #20

@esloch

Description

@esloch

Description:

bioRxiv and medRxiv are showing ingest failures on the es-journals cluster deployed on Hetzner. For bioRxiv, indexing fails with a document_parsing_exception on the funding field (mapped as text) when a structured value is encountered. For medRxiv, no Elasticsearch parsing error is recorded; the failure occurs during the download step, where the R script intermittently halts with a progress-bar error, interrupting ingestion. Last known successful updates were on 2025-07-29 (bioRxiv run hit the parsing error) and 2025-08-05 (medRxiv indexed 42 records before subsequent download attempts failed).

2025-07-29 02:30:17.213 | ERROR | Failed to index data for biorxiv: BadRequestError(400, 'document_parsing_exception', "... failed to parse field [funding] of type [text] ... Preview: '{award=CRSII5_170930;, name=Swiss National Science Foundation, id=https://ror.org/00yjd3n13, id-type=ROR}'")
2025-08-05 11:30:08.246 | INFO  | Starting the indexing process for medrxiv...
Estimated total number of records as per API metadata: 100
Error in pb_tick(self, private, len, tokens) : !self$finished is not TRUE
[ERROR]: Download process failed for medrxiv.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions