-
Notifications
You must be signed in to change notification settings - Fork 36
Description
Hi,
We hit this exception thrown this morning in our daily run on our set of declarations:
engine/src/archivist/recorder/repositories/git/dataMapper.js
Lines 56 to 58 in 041ca35
| if (modifiedFilesInCommit.length > 1) { | |
| throw new Error(`Only one file should have been recorded in ${hash}, but all these files were recorded: ${modifiedFilesInCommit.join(', ')}`); | |
| } |
It seems this error is uncaught and crashes the whole pipeline with no recovery options. I get the following log:
2025-11-28T06:05:18+00:00 �[31merror�[39m Zalando — Data Catalogue for Vetted Researchers Error: Only one file should have been recorded in 693a560f39b6de4006a6219c3e97c8778dbe6bbb, but all these files were recorded: Zalando/Data Catalogue for Vetted Researchers.html, Zalando/Data Catalogue for Vetted Researchers.pdf
And then a traceback:
at Module.toDomain (file:///home/pptruser/open-terms-archive/engine/src/archivist/recorder/repositories/git/dataMapper.js:57:11)
...
at async Archivist.trackTermsChanges (file:///home/pptruser/open-terms-archive/engine/src/archivist/index.js:184:22)
The snapshot commit mentioned is current HEAD of our snapshot Git repository: https://code.europa.eu/dsa/terms-and-conditions-database/vlops-and-vloses/vlop-vlose-snapshots/-/tree/693a560f39b6de4006a6219c3e97c8778dbe6bbb
As you can see in the "Zalando" folder, the "Data catalogue..." file is duplicated, once as (empty) HTML and once as PDF.
Relevant declaration is: https://code.europa.eu/dsa/terms-and-conditions-database/vlops-and-vloses/vlop-vlose-declarations/-/blob/main/declarations/Zalando.yml?ref_type=heads#L14-15
My understanding of the situation is that:
- Zalando declaration contains a PDF file, which was correctly fetched over the last days/weeks.
- At some point in time, some issue triggered an empty HTML reply (temporary issue on the webserver, antibot, whatever). Then, the engine recorded the HTML file alongside the PDF file.
- The snapshot directory now contains both a HTML and a PDF file, crashing the pipeline.
I can probably work around it by manually removing the faulty HTML file, but this issue will likely happen again on future runs.