Skip to content

Commit 18f390f

Browse files
committed
docs(postgres): add 23_populate_storage_locations.md describing data migration needed after schema migrated to version 23, update README.md to mention migratedb.docs directory
1 parent 26bd8e8 commit 18f390f

File tree

2 files changed

+132
-1
lines changed

2 files changed

+132
-1
lines changed

postgresql/README.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
We use
44
[Postgres 15](https://github.com/docker-library/postgres/tree/master/15/alpine)
5-
and Alpine 3.17.
5+
and Alpine 3.23.
66

77
Security is hardened:
88

@@ -26,3 +26,13 @@ The following environment variables can be used to configure the database:
2626
| POSTGRES_VERIFY_PEER | Enforce client verification | verify-ca |
2727

2828
Client verification is enforced if `POSTGRES_VERIFY_PEER` is set to `verify-ca` or `verify-full`.
29+
30+
# Data migration instructions docs
31+
32+
In [migratedb.docs](data_migration.docs) directory there are instructions on how to execute the data migration
33+
if upgrading a system with existing data related to specific versions of the schema.
34+
35+
The file naming convention is as follows: `${SCHEMA_VERSION}_${PRE/POST}_${SHORT_DESCRIPTION}.md`.
36+
* `${SCHEMA_VERSION}` - describes the schema version the data migration instructions relates to.
37+
* `${PRE/POST}` describes if these instructions should be executed before or after the schema migration has taken place.
38+
* `${SHORT_DESCRIPTION}` - short description describing the data migration
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
# Data Migration Plan POST schema migration version 23
2+
3+
## 1. Prep
4+
Note: Prep is only needed if you have multiple s3 buckets / posix volumes for a storage
5+
6+
Repeat steps for each s3 bucket / posix volume
7+
8+
### 1.1. Get file ids file for a storage
9+
10+
#### If S3 storage
11+
Get all files in form each s3 bucket
12+
```bash
13+
aws s3api list-objects-v2 --endpoint ${ENDPOINT} --bucket ${BUCKET} > ${BUCKET}_raw
14+
```
15+
16+
Transform raw response to just list of ids
17+
```bash
18+
cat ${BUCKET}_raw | jq -r '.Contents.[] | .Key' > ${BUCKET}_ids
19+
```
20+
21+
#### If Posix storage
22+
``` bash
23+
find . -type f -exec basename {} \; > ${POSIX_VOLUME}_ids
24+
```
25+
26+
### 1.2. Create new temporary tables to support DB migration
27+
28+
```sql
29+
CREATE TABLE temp_file_in_${BUCKET || POSIX_VOLUME} (
30+
file_id UUID PRIMARY KEY
31+
);
32+
```
33+
34+
### 1.3. Populate tables
35+
```bash
36+
psql -U $user -d sda -At -h $host -p $port -c "\copy sda.temp_file_in_${BUCKET || POSIX_VOLUME} from '/path/to/${BUCKET || POSIX_VOLUME}_ids' with delimiter as ','"
37+
```
38+
39+
## 2. Ensure schema migration has taken place
40+
Ensure [23_expand_files_table_with_storage_locations.sql](../migratedb.d/23_expand_files_table_with_storage_locations.sql)
41+
has been executed.
42+
43+
Can be checked by
44+
```sql
45+
SELECT * from sda.dbschema_version WHERE version = 23;
46+
```
47+
48+
49+
## 3. Run data migration queries
50+
51+
### 3.1. Inbox Location
52+
53+
If posix inbox replace `${INBOX_ENDPOINT}/${INBOX_BUCKET}` with `${INBOX_POSIX_VOLUME}`
54+
55+
If you only have one inbox storage
56+
```sql
57+
UPDATE sda.files
58+
SET submission_location = '${INBOX_ENDPOINT}/${INBOX_BUCKET}';
59+
```
60+
61+
If you only have multiple inbox storages, repeat following UPDATE statement per bucket/volume you have
62+
```sql
63+
UPDATE sda.files AS f
64+
SET submission_location = '${INBOX_ENDPOINT}/${INBOX_BUCKET}'
65+
FROM temp_file_in_${INBOX_BUCKET} AS in_buk
66+
WHERE f.id = in_buk.file_id;
67+
```
68+
69+
70+
### 3.2. Archive Location
71+
72+
If posix archive replace `${ARCHIVE_ENDPOINT}/${ARCHIVE_BUCKET}` with `/${ARCHIVE_POSIX_VOLUME}`
73+
74+
If you only have one archive storage
75+
```sql
76+
UPDATE sda.files
77+
SET archive_location ='${ARCHIVE_ENDPOINT}/${ARCHIVE_BUCKET}'
78+
WHERE archive_file_path != '';
79+
```
80+
81+
If you only have multiple archive storages, repeat following UPDATE statement per bucket/volume you have
82+
```sql
83+
UPDATE sda.files AS f
84+
SET archive_location = '${ARCHIVE_ENDPOINT}/${ARCHIVE_BUCKET}'
85+
FROM temp_file_in_${ARCHIVE_BUCKET} AS in_buk
86+
WHERE f.id = in_buk.file_id;
87+
```
88+
89+
### 3.3 Backup location
90+
Skip this if you do not have a backup storage
91+
92+
If posix archive replace '${BACKUP_ENDPOINT}/${BACKUP_BUCKET}' with '/${BACKUP_POSIX_VOLUME}'
93+
94+
If you only have one backup storage
95+
```sql
96+
UPDATE sda.files
97+
SET backup_location ='${BACKUP_ENDPOINT}/${BACKUP_BUCKET}'
98+
WHERE stable_id IS NOT NULL;
99+
```
100+
If you only have multiple backup storages, repeat following UPDATE statement per bucket/volume you have
101+
```sql
102+
UPDATE sda.files AS f
103+
SET backup_location = '${BACKUP_ENDPOINT}/${BACKUP_BUCKET}'
104+
FROM temp_file_in_${BACKUP_BUCKET} AS in_buk
105+
WHERE f.id = in_buk.file_id;
106+
```
107+
108+
## 4. Clean up
109+
Only needed if you did the [1. Prep step](#1-prep) and created temporary tables
110+
111+
Repeat DROP table statement per temporary table created
112+
```sql
113+
DROP TABLE sda.temp_file_in_${BUCKET || POSIX_VOLUME};
114+
```
115+
116+
## 5. Ensure all files have been updated
117+
```sql
118+
SELECT count(id) FROM sda.files WHERE submission_location IS NULL OR (archive_location IS NULL AND archive_file_path != '')
119+
```
120+
If there exists rows, then there are issues and the required locations of the files are not known.
121+
To resolve you could either manually delete those sda.files entries or ensure the files are uploaded to the expected locations.

0 commit comments

Comments
 (0)