Add script to migrate a repository structure #240

njohner · 2020-12-17T12:04:54Z

With this PR we implement a script to migrate a repository structure. This includes creating new positions, moving existing positions, merging positions into other ones, changing titles (and corresponding paths), changing reference numbers and changing the description. We also allow to set permissions for newly created positions. The input for the migration is an excel file containing the whole repository structure with additional columns for the new titles, reference numbers and descriptions. An example can be found in https://4teamwork.atlassian.net/browse/ROAD-1439.

To gain more confidence in the implementation I've implemented tests in opengever.core (4teamwork/opengever.core#6792). This also allows to test things like consistency (between data on the object and in the catalog) of contained objects after the migration.

Because the migration itself will leave the deployment in a messy state if it fails (some operations commit, e.g. the creation pipeline from the bundles), I've added as much validation of the operations as possible during the analysis step, allowing to abort before the migration even starts.

I've also decided not to fail during the last validation step (which checks that the migration produced the expected result). Failing only makes things worse, so I'd rather only log the errors and allow proper termination of the script.

I did not want to refactor the commits from philippe too much, so refactoring happens in subsequent commits. It's probably easiest to look at the whole diff...

A few points that are open for discussion:

The final validation only checks the repofolders that were modified and not all the contained objects. We could add such a check but I'm not sure whether it's worth it, especially because it might make the validation very slow. We could execute that conditionally though and add a command line argument.
I'm not sure whether we want to store former reference numbers of dossiers in the former_reference_number attribute. Should be feasible if necessary
I also wonder whether we could make recovery from a failure or so easier. For example:
- pickling the RepositoryExcelAnalyser
- adding an annotation on all objects that were migrated to make them more easily identifiable?

Also note that I've tested the migration with the HBA repository structure and fake data (i.e. one folder and one document per repofolder in the structure).

For https://4teamwork.atlassian.net/browse/CA-1266

njohner

🎉 nice

opengever/maintenance/scripts/repository_migration_analyse.py

njohner · 2020-12-17T12:46:35Z

opengever/maintenance/scripts/repository_migration_analyse.py

+                need_number_change = True
+                self.number_changes[new_item['position']] = old_item['position']
+
+                # check if parent is already changed - so no need to change


Because the change is done recursively I guess

It could be that both the parent has a new number and it itself has a new number.

opengever/maintenance/scripts/repository_migration_analyse.py

njohner · 2020-12-17T14:55:17Z

opengever/maintenance/scripts/repository_migration_analyse.py

+            obj = uuidToObject(item['uid'])
+            if not obj:
+                # New created objects can be ignored
+                break


shouldn't that be continue?

opengever/maintenance/scripts/repository_migration_analyse.py

njohner · 2020-12-17T15:52:28Z

opengever/maintenance/scripts/repository_migration_analyse.py

+                # New created objects can be ignored
+                break
+
+            obj.reindexObject(idxs=['Title', 'sortable_title', 'path', 'reference'])


Note to myself: check whether any of these appear in the searchableText

In solr SearchableText only contains the content from the blob

opengever/maintenance/scripts/update_object_ids.py

opengever/maintenance/scripts/repository_migration_analyse.py

Which reads and analyse the diff excel and export the analyse in a separate excel.

…stments.

The ObjectIDUpdater currently uses directly the INameFromTitle behavior, that results in conflicts when the ID is already used by another object. So I changed the behavior to using the namechooser, which calls the INameFromTitle behavior, but also handles conflicting ids.

avoid restarting the instance all the time.

The check to know whether a given position should be deleted was done on a column that did not make sense. This was most likely a leftover from the previous excel format. Instead we check whether the position has a new position or not.

This reverts commit cb802ed.

This reverts commit 80c1228.

For this we need to patch several methods from the bundle sections.

buchi

LGTM!

I would ✂️ the _analyse part from the script filename.

As already discussed, doc properties are updated when moving repos.
I'm now trying out the migration with the full content of HBA...

opengever/maintenance/scripts/repository_migration_analyse.py

njohner · 2021-01-08T14:09:03Z

Thanks for the review. I updated according to your comments.

We skip syncing the predecessor as this uses IntIds to resolve it, which fails in the HBA migration as IntIds rely on path and also get updated in an event handler. It is fine to not update the predecessor as it anyway does not change during a move operation.

njohner commented Dec 17, 2020

View reviewed changes

njohner commented Dec 18, 2020

View reviewed changes

opengever/maintenance/scripts/repository_migration_analyse.py Outdated Show resolved Hide resolved

phgross added 19 commits December 21, 2020 08:08

Add repository_migration analyser.

be8430d

Which reads and analyse the diff excel and export the analyse in a separate excel.

Small cleanup.

330227c

Add check for max repository_depth.

94b4537

Add check for leaf_node principle violations.

a979b82

Switched output to real values (new_number, new_parent ...)

5a095ef

Refactor Analyser so that it returns operations_list for migrator.

e4a3c5d

Ignore empty rows during analyse.

bfbc6e4

Extend analyse with information about new repositoryfolders.

94f9bc4

Add first phase of the Migrator - the creation of new repositoryfolders.

b578023

Add step "move" to the repository migrator.

175da34

Add step "referenceprefix number adjustement".

0a17f8b

Also regenerate reference_number mapping during reference number adju…

5512a84

…stments.

Add migration step "rename and id adjustment"

4531aaa

Add reindex step.

767e5aa

Adjust excel reader to the new, simpler excel format.

98d4020

Add update description step to repository migration.

bff1093

Add validation step to the migrator.

5603a6d

WIP: browser view to test and develop the script

cb802ed

avoid restarting the instance all the time.

njohner force-pushed the pg_repository_migration_sg branch from e243fe1 to cb802ed Compare December 21, 2020 07:12

Niklaus Johner added 2 commits December 21, 2020 08:16

Add init file to scripts folder.

f75d658

Make sure to have a string for positions.

4b30641

njohner force-pushed the pg_repository_migration_sg branch from d823d02 to 4b30641 Compare December 21, 2020 11:15

Niklaus Johner added 5 commits December 22, 2020 08:28

Skip validation for now, as it is broken.

251ebcf

Refactoring: add OperationItem class.

172f5ed

Add methods for reference_number and parent_position on OperationItem.

54b35cc

Check for deletion on new_position column.

214cefb

The check to know whether a given position should be deleted was done on a column that did not make sense. This was most likely a leftover from the previous excel format. Instead we check whether the position has a new position or not.

Add ExcelDataExtractor with validation of Excel format.

9da4ecb

Fix migration validation for deep repofolder reference numbers.

4aaf095

njohner force-pushed the pg_repository_migration_sg branch from 70ad337 to 4aaf095 Compare January 4, 2021 16:59

Revert "WIP: browser view to test and develop the script"

a645c1f

This reverts commit cb802ed.

njohner changed the title ~~Pg repository migration sg~~ Add script to migrate a repository structure Jan 4, 2021

njohner marked this pull request as ready for review January 4, 2021 17:40

njohner requested a review from a team January 4, 2021 17:40

buchi self-assigned this Jan 5, 2021

Niklaus Johner added 4 commits January 5, 2021 13:54

Revert "Do not fail when validation fails."

61c4d82

This reverts commit 80c1228.

Make sure to commit only at the end of the migration.

165f6e5

For this we need to patch several methods from the bundle sections.

Save operation information on migrated objects.

77b0dde

Add missing import.

c7e080b

buchi requested changes Jan 8, 2021

View reviewed changes

Niklaus Johner added 6 commits January 8, 2021 14:14

Fix error loging for delete operations.

a510bce

Fix docstrings for monkey patches.

ecb366d

Avoid generating bundle import reports during OS migration.

80af7bf

Improve naming: OperationItem to RepositoryPosition.

b9cc350

Fix information saved on migrated objects.

446a1de

Rename migration script.

cbd3aa7

njohner mentioned this pull request Jan 8, 2021

Add OS migration tests. 4teamwork/opengever.core#6792

Draft

njohner force-pushed the pg_repository_migration_sg branch from 81d6aef to 33a4ace Compare January 13, 2021 08:58

Niklaus Johner added 5 commits January 14, 2021 15:44

Make sure that repofolder match data in excel before migration.

c44e0ac

Improve progress logging.

b6ef917

One more patch to avoid committing during bundle import.

0baefa4

Do not fail if data is inconsistent in catalog.

a6bcf68

Clean-up unnecessary print statement.

52853b5

njohner force-pushed the pg_repository_migration_sg branch from f4971db to 52853b5 Compare January 18, 2021 07:59

buchi approved these changes Jan 18, 2021

View reviewed changes

njohner merged commit 2b111c1 into master Jan 18, 2021

Add script to migrate a repository structure #240

Add script to migrate a repository structure #240

Uh oh!

Conversation

njohner commented Dec 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njohner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

njohner Dec 17, 2020

Choose a reason for hiding this comment

Uh oh!

njohner Dec 23, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

njohner Dec 17, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

njohner Dec 17, 2020

Choose a reason for hiding this comment

Uh oh!

njohner Jan 4, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

buchi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

njohner commented Jan 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

njohner commented Dec 17, 2020 •

edited

Loading