Skip to content

Customized Use-case problems with Metastore & Notebooks Migration #269

@harshfadiya

Description

@harshfadiya

Hi Team,

We are currently performing the E2 migration using DMT, we came accross below scenario:

  1. Problem : For Metastore
    We have performed the initial metastore migration and tables schema were reflecting in the E2 workspace. After which few tables schema were changes in legacy workspace (column rename, datatype change) and partition columns were modified.
    After that we performed the metastore migration again to accomodate the delta changes but we observed that the tables schema in the E2 workspace were not modified. We observed that during export metastore, the tool generates create DDL statement for each table using 'show create table' query. Due to which, if the table is already exist in the E2 workspace, it is skipping those tables by stating 'Table already exist'.

We want to overwrite the existing tables metadata in E2 with updated tables from legacy workspace. This scenario is not covered in the DMT scripts.

Sol.
To achieve this scenario, we have modified lines 516,527 in HiveClient.py script(from dl_str to ddl_str[:7]+'OR REPLACE '+ddl_str[7:]). Please check the highlighted updates in the attached SS. (instead of CREATE DDL, we are using CREATE OR REPLACE now)

So, we want confirmation from the Databricks tool development team regarding, will there be any further issues or complexities after this modifications ? That will give us some clarity. Thanks.
image (16)

  1. Problem : Notebooks
    While using this tool for client, they have scenario that, they have segregated the notebooks list in 2 sublists.
    a) legacy notebooks must overwrite E2 workspace notebooks if already exist
    b) legacy notebooks must skip notebooks already present notebooks in E2 (as users have done changes in E2)

As DMT tools captures all notebooks in one go and perform export/import either using skip/overwrite only one option at a time.
Please provide advice regarding, which approach shall we use to accomodate the above usecase to migrate notebooks accurately. That will help us to finalize the plan. Looking forward for the response as soon as possible.

Many thanks in advance !

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions