Skip to content

Bug: Not possible to export only the notebooks from a workplace #304

@srggrs

Description

@srggrs

The problem

I want to export/import only the notebooks from a workspace but all the suggested commands are not working.

Steps

  1. In tried the manual way, as it seems running the pipeline is something for admin and also we don't really need to export all the other stuffs are this is managed as infra as code with terraform etc
python export_db.py --profile $SRC_PROFILE --no-ssl-verification --export-home [email protected] --use-checkpoint --num-parallel 8 --retry-total 30 --retry-backoff 1.0

this gave me an error

Note: running export_db.py directly is not recommended. Please use migration_pipeline.py
Exporting home directory: [email protected]
Traceback (most recent call last):
  File "/path/to/migration/repo/migrate/export_db.py", line 332, in <module>
    main()
  File "/path/to/migration/repo/migrate/export_db.py", line 265, in main
    ws_c = WorkspaceClient(client_config, checkpoint_service)
  File "/path/to/migration/repo/migrate/dbclient/WorkspaceClient.py", line 32, in __init__
    self.skip_large_nb = configs['skip_large_nb']
KeyError: 'skip_large_nb'
  1. So I decided to try running the pipeline after the suggestion at the beginning and diving into the CLI helper I saw you can pass only notebook task
python3 migration_pipeline.py --profile $SRC_PROFILE --use-checkpoint --retry-total 30 --num-parallel 8 --retry-backoff 1.0 --keep-tasks notebooks --export-pipeline

but this errored for with

Using the session id: <the ID>
2025-03-21,08:51:13;INFO;Start export_instance_profiles
2025-03-21,08:51:13;INFO;export_instance_profiles Skipped.
2025-03-21,08:51:13;INFO;Start export_users
2025-03-21,08:51:13;INFO;export_users Skipped.
2025-03-21,08:51:13;INFO;Start export_groups
2025-03-21,08:51:13;INFO;export_groups Skipped.
2025-03-21,08:51:13;INFO;Start export_workspace_items_log
2025-03-21,08:51:13;INFO;export_workspace_items_log Skipped.
2025-03-21,08:51:13;INFO;Start export_workspace_acls
2025-03-21,08:51:13;INFO;export_workspace_acls Skipped.
2025-03-21,08:51:13;INFO;Start export_notebooks
Traceback (most recent call last):
  File "/path/to/migration/repo/migrate/migration_pipeline.py", line 378, in <module>
    main()
  File "/path/to/migration/repo/migrate/migration_pipeline.py", line 374, in main
    pipeline.run()
  File "/path/to/migration/repo/migrate/pipeline/pipeline.py", line 64, in run
    future.result()
  File "/path/to/python/env/envs/bricks-migration/lib/python3.9/concurrent/futures/_base.py", line 446, in result
    return self.__get_result()
  File "/path/to/python/env/envs/bricks-migration/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/path/to/python/env/envs/bricks-migration/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/path/to/migration/repo/migrate/pipeline/pipeline.py", line 73, in _run_task
    task.run()
  File "/path/to/migration/repo/migrate/tasks/tasks.py", line 144, in run
    num_notebooks = ws_c.download_notebooks(num_parallel=self.client_config["num_parallel"])
  File "/path/to/migration/repo/migrate/dbclient/WorkspaceClient.py", line 323, in download_notebooks
    raise Exception("Run --workspace first to download full log of all notebooks.")
Exception: Run --workspace first to download full log of all notebooks.
  1. so I tried python3 migration_pipeline.py --profile $SRC_PROFILE --use-checkpoint --retry-total 30 --num-parallel 8 --retry-backoff 1.0 --keep-tasks notebooks --export-pipeline --workspace but got migration_pipeline.py: error: unrecognized arguments: --workspace

Also there is no option to exporting just your own notebooks using the --export-home flag like above.

So I decided to fix the code myself for export_db.py script, to set skip_large_nb = None. Dets on incoming PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions