Skip to content

Commit 8a8c642

Browse files
committed
docs: ADR for re-processing of documents
1 parent 3338ed6 commit 8a8c642

File tree

1 file changed

+81
-0
lines changed

1 file changed

+81
-0
lines changed
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# 00008. Re-process documents
2+
3+
Date: 2025-08-08
4+
5+
## Status
6+
7+
DRAFT
8+
9+
## Context
10+
11+
During the process of ingestion, we extract certain information of the uploaded documents and store that information
12+
in the database. We also store the original source document "as-is".
13+
14+
When making changes to the database structure, we also have a migration process, which takes care of upgrading the
15+
database structures during an upgrade.
16+
17+
However, in some cases, changing the database structure actually means to extract more information from documents and is
18+
currently stored in the database. Or information is extracted in a different way. This requires a re-processing of
19+
all documents affected by this change.
20+
21+
### Example
22+
23+
We do ignore all CVSS v2 scores at the moment. Adding new fields for storing v2 scores, we wouldn't have
24+
any stored in the database without re-processing documents and extracting that information.
25+
26+
### Assumptions
27+
28+
This ADR makes the following assumptions:
29+
30+
* All documents are stored in the storage
31+
* It is expected that an upgrade is actually required
32+
* Running such migrations is expected to take a long time
33+
34+
Question? Do we want to support downgrades?
35+
36+
## Decision
37+
38+
### Option 1
39+
40+
During the migration of database structures (sea orm), we also re-process all documents (when required).
41+
42+
In order to report progress, we could write that state into a table and expose that information to the user via the UI.
43+
44+
* πŸ‘Ž Might serve inaccurate data for a while
45+
* πŸ‘Ž Might block an upgrade if re-processing fails
46+
* πŸ‘ Can fully migrate database (create mandatory field as optional -> re-process -> make mandatory)
47+
* πŸ‘Ž Might be tricky to create a combined re-processing of multiple ones
48+
49+
### Option 2
50+
51+
We create a similar module as for the importer. Running migrations after an upgrade. Accepting that in the meantime,
52+
we might service inaccurate data.
53+
54+
* πŸ‘Ž Might serve inaccurate data for a while for a longer time
55+
* πŸ‘Ž Can't fully migrate database (new mandatory field won't work)
56+
* πŸ‘ Upgrade process is faster and less complex
57+
* πŸ‘Ž Requires some coordination between instances (only one processor at a time, maybe one after the other)
58+
59+
### Option 3
60+
61+
We change ingestion in a way to it is possible to just re-ingest every document. Meaning, we re-ingest from the
62+
original sources.
63+
64+
* πŸ‘Ž Might serve inaccurate data for a while for a longer time
65+
* πŸ‘Ž Can't fully migrate database (new mandatory field won't work)
66+
* πŸ‘ Upgrade process is faster and less complex
67+
* πŸ‘Ž Original sources might no longer have the documents
68+
* πŸ‘Ž Won't work for manual (API) uploads
69+
* πŸ‘Ž Would require removing optimizations for existing documents
70+
71+
## Open items
72+
73+
…
74+
75+
## Alternative approaches
76+
77+
…
78+
79+
## Consequences
80+
81+
…

0 commit comments

Comments
Β (0)