Skip to content

Conversation

@njohner
Copy link
Contributor

@njohner njohner commented Dec 17, 2020

With this PR we implement a script to migrate a repository structure. This includes creating new positions, moving existing positions, merging positions into other ones, changing titles (and corresponding paths), changing reference numbers and changing the description. We also allow to set permissions for newly created positions. The input for the migration is an excel file containing the whole repository structure with additional columns for the new titles, reference numbers and descriptions. An example can be found in https://4teamwork.atlassian.net/browse/ROAD-1439.

To gain more confidence in the implementation I've implemented tests in opengever.core (4teamwork/opengever.core#6792). This also allows to test things like consistency (between data on the object and in the catalog) of contained objects after the migration.

Because the migration itself will leave the deployment in a messy state if it fails (some operations commit, e.g. the creation pipeline from the bundles), I've added as much validation of the operations as possible during the analysis step, allowing to abort before the migration even starts.

I've also decided not to fail during the last validation step (which checks that the migration produced the expected result). Failing only makes things worse, so I'd rather only log the errors and allow proper termination of the script.

I did not want to refactor the commits from philippe too much, so refactoring happens in subsequent commits. It's probably easiest to look at the whole diff...

A few points that are open for discussion:

  • The final validation only checks the repofolders that were modified and not all the contained objects. We could add such a check but I'm not sure whether it's worth it, especially because it might make the validation very slow. We could execute that conditionally though and add a command line argument.
  • I'm not sure whether we want to store former reference numbers of dossiers in the former_reference_number attribute. Should be feasible if necessary
  • I also wonder whether we could make recovery from a failure or so easier. For example:
    • pickling the RepositoryExcelAnalyser
    • adding an annotation on all objects that were migrated to make them more easily identifiable?

Also note that I've tested the migration with the HBA repository structure and fake data (i.e. one folder and one document per repofolder in the structure).

For https://4teamwork.atlassian.net/browse/CA-1266

Copy link
Contributor Author

@njohner njohner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 nice

need_number_change = True
self.number_changes[new_item['position']] = old_item['position']

# check if parent is already changed - so no need to change
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the change is done recursively I guess

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be that both the parent has a new number and it itself has a new number.

obj = uuidToObject(item['uid'])
if not obj:
# New created objects can be ignored
break
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't that be continue?

# New created objects can be ignored
break

obj.reindexObject(idxs=['Title', 'sortable_title', 'path', 'reference'])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: check whether any of these appear in the searchableText

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In solr SearchableText only contains the content from the blob

Which reads and analyse the diff excel and export the analyse in a
separate excel.
The ObjectIDUpdater currently uses directly the INameFromTitle
behavior, that results in conflicts when the ID is already used by
another object. So I changed the behavior to using the namechooser,
which calls the INameFromTitle behavior, but also handles conflicting ids.
avoid restarting the instance all the time.
@njohner njohner force-pushed the pg_repository_migration_sg branch from e243fe1 to cb802ed Compare December 21, 2020 07:12
@njohner njohner force-pushed the pg_repository_migration_sg branch from d823d02 to 4b30641 Compare December 21, 2020 11:15
Niklaus Johner added 5 commits December 22, 2020 08:28
The check to know whether a given position should be deleted
was done on a column that did not make sense. This was most
likely a leftover from the previous excel format. Instead
we check whether the position has a new position or not.
@njohner njohner force-pushed the pg_repository_migration_sg branch from 70ad337 to 4aaf095 Compare January 4, 2021 16:59
@njohner njohner changed the title Pg repository migration sg Add script to migrate a repository structure Jan 4, 2021
@njohner njohner marked this pull request as ready for review January 4, 2021 17:40
@njohner njohner requested a review from a team January 4, 2021 17:40
@buchi buchi self-assigned this Jan 5, 2021
Copy link
Member

@buchi buchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I would ✂️ the _analyse part from the script filename.

As already discussed, doc properties are updated when moving repos.
I'm now trying out the migration with the full content of HBA...

@njohner
Copy link
Contributor Author

njohner commented Jan 8, 2021

Thanks for the review. I updated according to your comments.

We skip syncing the predecessor as this uses IntIds to resolve it,
which fails in the HBA migration as IntIds rely on path and also get
updated in an event handler. It is fine to not update the predecessor
as it anyway does not change during a move operation.
@njohner njohner force-pushed the pg_repository_migration_sg branch from 81d6aef to 33a4ace Compare January 13, 2021 08:58
@njohner njohner force-pushed the pg_repository_migration_sg branch from f4971db to 52853b5 Compare January 18, 2021 07:59
@njohner njohner merged commit 2b111c1 into master Jan 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants