|
| 1 | +# 00008. Re-process documents |
| 2 | + |
| 3 | +Date: 2025-08-08 |
| 4 | + |
| 5 | +## Status |
| 6 | + |
| 7 | +DRAFT |
| 8 | + |
| 9 | +## Context |
| 10 | + |
| 11 | +During the process of ingestion, we extract certain information of the uploaded documents and store that information |
| 12 | +in the database. We also store the original source document "as-is". |
| 13 | + |
| 14 | +When making changes to the database structure, we also have a migration process, which takes care of upgrading the |
| 15 | +database structures during an upgrade. |
| 16 | + |
| 17 | +However, in some cases, changing the database structure actually means to extract more information from documents and is |
| 18 | +currently stored in the database. Or information is extracted in a different way. This requires a re-processing of |
| 19 | +all documents affected by this change. |
| 20 | + |
| 21 | +### Example |
| 22 | + |
| 23 | +We do ignore all CVSS v2 scores at the moment. Adding new fields for storing v2 scores, we wouldn't have |
| 24 | +any stored in the database without re-processing documents and extracting that information. |
| 25 | + |
| 26 | +### Assumptions |
| 27 | + |
| 28 | +This ADR makes the following assumptions: |
| 29 | + |
| 30 | +* All documents are stored in the storage |
| 31 | +* It is expected that an upgrade is actually required |
| 32 | +* Running such migrations is expected to take a long time |
| 33 | + |
| 34 | +Question? Do we want to support downgrades? |
| 35 | + |
| 36 | +## Decision |
| 37 | + |
| 38 | +### Option 1 |
| 39 | + |
| 40 | +During the migration of database structures (sea orm), we also re-process all documents (when required). |
| 41 | + |
| 42 | +In order to report progress, we could write that state into a table and expose that information to the user via the UI. |
| 43 | + |
| 44 | +* π Might serve inaccurate data for a while |
| 45 | +* π Might block an upgrade if re-processing fails |
| 46 | +* π Can fully migrate database (create mandatory field as optional -> re-process -> make mandatory) |
| 47 | +* π Might be tricky to create a combined re-processing of multiple ones |
| 48 | + |
| 49 | +### Option 2 |
| 50 | + |
| 51 | +We create a similar module as for the importer. Running migrations after an upgrade. Accepting that in the meantime, |
| 52 | +we might service inaccurate data. |
| 53 | + |
| 54 | +* π Might serve inaccurate data for a while for a longer time |
| 55 | +* π Can't fully migrate database (new mandatory field won't work) |
| 56 | +* π Upgrade process is faster and less complex |
| 57 | +* π Requires some coordination between instances (only one processor at a time, maybe one after the other) |
| 58 | + |
| 59 | +### Option 3 |
| 60 | + |
| 61 | +We change ingestion in a way to it is possible to just re-ingest every document. Meaning, we re-ingest from the |
| 62 | +original sources. |
| 63 | + |
| 64 | +* π Might serve inaccurate data for a while for a longer time |
| 65 | +* π Can't fully migrate database (new mandatory field won't work) |
| 66 | +* π Upgrade process is faster and less complex |
| 67 | +* π Original sources might no longer have the documents |
| 68 | +* π Won't work for manual (API) uploads |
| 69 | +* π Would require removing optimizations for existing documents |
| 70 | + |
| 71 | +## Open items |
| 72 | + |
| 73 | +β¦ |
| 74 | + |
| 75 | +## Alternative approaches |
| 76 | + |
| 77 | +β¦ |
| 78 | + |
| 79 | +## Consequences |
| 80 | + |
| 81 | +β¦ |
0 commit comments