Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
156 commits
Select commit Hold shift + click to select a range
722b35d
workspace mapping scripts
Dec 13, 2022
5b3df0e
updating workspace mapping scripts
Dec 19, 2022
416ddda
Merge branch 'databrickslabs:master' into master
veenaramesh Dec 19, 2022
8b06df9
delete unnecessary files
veenaramesh Dec 19, 2022
d0cdf0a
adding util notebooks
Dec 19, 2022
a72f2b0
consistency edits
Dec 21, 2022
fb446f4
Merge branch 'databrickslabs:master' into master
veenaramesh Jan 4, 2023
b6392b6
Merge branch 'databrickslabs:master' into master
veenaramesh Jan 6, 2023
31b8d91
Merge branch 'databrickslabs:master' into master
veenaramesh Jan 12, 2023
e6a5d87
Merge branch 'databrickslabs:master' into master
veenaramesh Jan 23, 2023
23cf07f
adds notification if Job Owner is dropped.
veenaramesh Jan 23, 2023
278d13e
Merge branch 'databrickslabs:master' into master
veenaramesh Feb 16, 2023
8a89b14
fixes for secret scopes, shared folders, and global logs
Mar 9, 2023
9e88271
adding functionality to also split database_details.log
Mar 10, 2023
d26f0d1
:erge branch 'master' of https://github.com/Lovelytics/migrate
Mar 10, 2023
7ce09f9
Merge branch 'databrickslabs:master' into master
veenaramesh Mar 16, 2023
05f7c95
fix to secret scope acl split
veenaramesh Mar 17, 2023
6ce0947
fixing groups split issue
Mar 23, 2023
b96ecb0
Merge branch 'master' of https://github.com/Lovelytics/migrate
Mar 23, 2023
78ed463
adjusting file names
Mar 23, 2023
27deb67
fixing ACLs issue for groups
veenaramesh Mar 24, 2023
bfc70ff
fixing Shared + artifacts ACLs split
Mar 27, 2023
9273d37
merging split_logs
Mar 27, 2023
d646e16
adding + fixing error logging capabilities
Mar 28, 2023
5181fce
rephrasing outputs. grammatical issues
Mar 28, 2023
caf4419
adding default job owner capability
Mar 28, 2023
1322074
fixing dir acl issue
Mar 28, 2023
b462bfc
revert last commit
veenaramesh Mar 28, 2023
1c90ee3
fixing syntax error
veenaramesh Mar 28, 2023
cecf55e
adding library migration script; content created by Tejas
Mar 30, 2023
32c3a6f
adding logging fun stuff
Mar 30, 2023
f550f70
Update default cluster name to E2_Migration
veenaramesh Apr 6, 2023
0d0c95d
Update default cluster name, remove spark configs
veenaramesh Apr 6, 2023
b16961f
fixing parser.py to allow export_db.py to be used
Apr 6, 2023
da0ea62
fixing parser.py to allow export_db.py to be used
Apr 6, 2023
5a02413
Merge branch 'databrickslabs:master' into master
veenaramesh Apr 6, 2023
a7ddcf0
Merge branch 'databrickslabs:master' into master
veenaramesh Apr 11, 2023
42f3ac7
Merge branch 'databrickslabs:master' into master
veenaramesh Apr 12, 2023
6aec643
new default job owner paramter to set default owner when owners are m…
Apr 17, 2023
d19c3b9
will update secret_scopes_acls for updating emails
veenaramesh Apr 24, 2023
f3ab6ca
Merge branch 'databrickslabs:master' into master
veenaramesh Apr 24, 2023
2a4357e
Merge branch 'databrickslabs:master' into master
veenaramesh May 4, 2023
26e7614
Update JobsClient.py
veenaramesh May 4, 2023
6243948
Update dbclient.py
veenaramesh May 16, 2023
6136a24
Merge branch 'databrickslabs:master' into master
veenaramesh Jun 1, 2023
7a68139
Asset mapping spreadsheet updates (#8)
SarahCree Jun 9, 2023
3b06a48
fixing merge conflict with sync
Jun 9, 2023
bde2351
Merge branch 'databrickslabs-master'
Jun 9, 2023
099a4a6
adding parameter tag
Jul 10, 2023
2a9604d
adding destination parameter
Jul 10, 2023
026bc7d
fixing grammar issue
Jul 10, 2023
dee67cb
adding cluster policies to sheet
Jul 10, 2023
8ca0f6a
Merge branch 'databrickslabs:master' into master
veenaramesh Jul 20, 2023
5cd32f7
Merge branch 'databrickslabs:master' into master
veenaramesh Jul 26, 2023
3e3c31e
Added Terraform Exporter product notice (#10)
veenaramesh Aug 16, 2023
fdde2e8
Merge branch 'databrickslabs:master' into master
veenaramesh Aug 16, 2023
0f288fb
Notebook ACLs failure issue resolved
tejasnp163 Aug 24, 2023
198c9ae
Merge pull request #11 from Lovelytics/notebook_acl_patch
tejasnp163 Aug 24, 2023
b8bc1d2
Fix missing key for hipaa option when using export_db (#12)
veenaramesh Aug 28, 2023
20b62cb
Merge branch 'databrickslabs:master' into master
veenaramesh Aug 28, 2023
6c37aa9
Update HiveClient.py
veenaramesh Aug 28, 2023
446b671
Update to_csv.py
veenaramesh Aug 28, 2023
d2cf97a
Update to_csv.py
veenaramesh Aug 28, 2023
24f0025
Update ClustersClient.py to add old cluster id and name (#13)
tejasnp163 Sep 6, 2023
5b69ed9
Merge branch 'databrickslabs:master' into master
veenaramesh Sep 6, 2023
d4187ce
Update WorkspaceClient.py to optimize ACL bug fix code
tejasnp163 Sep 14, 2023
90f38b7
Merge pull request #14 from Lovelytics/tejasnp163-patch-2
tejasnp163 Sep 25, 2023
00a0040
Update to_csv.py for better error handling
tejasnp163 Oct 4, 2023
f90d8bc
Merge pull request #15 from Lovelytics/tejasnp163-patch-3
veenaramesh Oct 4, 2023
6e6dc9c
added option to convert all usernames to lowercase
Oct 5, 2023
f7b2541
Update HiveClient.py - include view name in failed import logs
veenaramesh Oct 12, 2023
ae99d6b
Update to_csv.py
tejasnp163 Oct 18, 2023
1c8915c
Merge pull request #17 from Lovelytics/tejasnp163-patch-3
allistaircota Oct 18, 2023
0a2edda
Filtering out DS_Store hidden file when listing metastore dbs
Oct 18, 2023
41d4973
Merge pull request #18 from Lovelytics/allistair_patch_2
tejasnp163 Oct 18, 2023
098074f
change cluster names, add databases to csv
Oct 24, 2023
47e3736
split on databases + tables
Oct 24, 2023
aaa6dc6
Restrict the job renaming to imported jobs with ::: only
tejasnp163 Oct 31, 2023
cc8a087
Merge pull request #19 from Lovelytics/tejasnp163-patch-4
veenaramesh Oct 31, 2023
ccdae72
export single database
Nov 7, 2023
b00a44b
changing databases param to accept list
Nov 7, 2023
9df4ff6
Add files via upload
veenaramesh Nov 30, 2023
59e8f9f
adding DBFS sizing notebook
veenaramesh Dec 1, 2023
fe2002d
Add files via upload
veenaramesh Dec 5, 2023
8f2d5ff
Add files via upload
veenaramesh Dec 5, 2023
0b9fe11
add rename emails file
veenaramesh Dec 7, 2023
32dc283
add files
veenaramesh Dec 12, 2023
c3512d6
add --nitro parameter
Dec 13, 2023
a6ec90f
deleting extraneous files
veenaramesh Dec 13, 2023
6a1d233
delete extra files
veenaramesh Dec 13, 2023
009bb5c
delete extra files
veenaramesh Dec 13, 2023
22aa453
Merge pull request #20 from Lovelytics/veenaramesh-patch-1
veenaramesh Dec 13, 2023
7b0f1bc
delete extra files
veenaramesh Dec 13, 2023
0b67c31
update dbclient/ClustersClient.py to update nitro
Dec 14, 2023
440dee6
Merge branch 'master' of https://github.com/Lovelytics/migrate
Dec 14, 2023
2e980cf
adding replace group scripts
Jan 5, 2024
444c7cb
wrong push! new push
Jan 5, 2024
68c6aee
delete notebooks!
veenaramesh Jan 9, 2024
4805188
Update delete_clusters.py - unpins now as well
veenaramesh Jan 9, 2024
fc616d3
Update delete_clusters.py
veenaramesh Jan 9, 2024
54022b0
Merge pull request #21 from Lovelytics/veenaramesh-patch-1
veenaramesh Jan 9, 2024
d7e2db9
Clusters Scout to see what DBRs are being used
veenaramesh Jan 11, 2024
8ba490b
Update rename_emails.py
veenaramesh Jan 11, 2024
f0a414e
code to update clusters with correct IPs
veenaramesh Jan 12, 2024
ce2bb71
Update patch_clusters.py
veenaramesh Jan 12, 2024
3cd46fd
Update patch_clusters.py
veenaramesh Jan 12, 2024
91bd2be
Update rename_emails.py
veenaramesh Jan 12, 2024
754efb1
Update patch_clusters.py
veenaramesh Jan 16, 2024
6f937f1
add sample_jobs filter
veenaramesh Jan 17, 2024
162f11e
Add metastore scouts
veenaramesh Jan 17, 2024
b9760e0
empty dir creator
veenaramesh Jan 17, 2024
7932516
Add files via upload
veenaramesh Jan 17, 2024
55aed2a
Create DBFS File Import
veenaramesh Jan 18, 2024
434899a
Update ClustersClient.py
veenaramesh Jan 22, 2024
dd43726
Merge pull request #24 from Lovelytics/veenaramesh-dev
veenaramesh Jan 24, 2024
3f206f7
Merge pull request #16 from Lovelytics/allistair_patch_1
veenaramesh Jan 24, 2024
d0a12d3
Update ClustersClient.py
veenaramesh Jan 30, 2024
6def613
Update ClustersClient.py
veenaramesh Jan 30, 2024
a41230c
Update ClustersClient.py
veenaramesh Feb 1, 2024
922e5c7
Update ClustersClient.py
veenaramesh Feb 2, 2024
452223e
rename emails for specific edge case
veenaramesh Feb 6, 2024
d2bdd48
Delete data/notebooks/Clusters_Scout.py
veenaramesh Feb 7, 2024
919610f
Delete data/notebooks/create_sample_jobs.py
veenaramesh Feb 7, 2024
dd5cd46
Delete data/notebooks/patch_clusters.py
veenaramesh Feb 7, 2024
47e3dc9
Delete data/notebooks/rename_emails.py
veenaramesh Feb 7, 2024
4d927e9
Delete data/notebooks/delete_clusters.py
veenaramesh Feb 7, 2024
b025344
Delete data/notebooks/replace_groups.py
veenaramesh Feb 7, 2024
a0ba5e6
Update ClustersClient.py
veenaramesh Mar 1, 2024
d89bfe2
Add files via upload
cbartholomew2 Jun 21, 2024
e2e3a43
Update WorkspaceClient.py
mcmuffin18 Jun 21, 2024
26661ad
Merge pull request #25 from Lovelytics/mcmuffin18-patch-1
cbartholomew2 Jun 21, 2024
c05a6f8
Search and Replace in File
mcmuffin18 Jul 22, 2024
24056e6
Merge pull request #26 from Lovelytics/mcmuffin18-patch-1
cbartholomew2 Jul 22, 2024
92df318
Add files via upload
mcmuffin18 Jul 29, 2024
f756197
Merge pull request #27 from Lovelytics/mcmuffin18-patch-2
cbartholomew2 Jul 29, 2024
c2ca4f5
Add files via upload
mcmuffin18 Jul 30, 2024
31fe731
Merge pull request #28 from Lovelytics/mcmuffin18-patch-3
cbartholomew2 Jul 30, 2024
5beb31c
Add files via upload
mcmuffin18 Aug 1, 2024
6273fef
Merge pull request #29 from Lovelytics/mcmuffin18-patch-4
cbartholomew2 Aug 1, 2024
0c74677
Add files via upload
mcmuffin18 Aug 1, 2024
0296eda
Merge pull request #30 from Lovelytics/mcmuffin18-patch-5
mcmuffin18 Aug 1, 2024
6a24baa
Add files via upload
mcmuffin18 Aug 15, 2024
b9b3818
Merge pull request #32 from Lovelytics/mcmuffin18-patch-7
cbartholomew2 Aug 15, 2024
984cfe0
Add files via upload
mcmuffin18 Aug 15, 2024
0c58830
Merge pull request #33 from Lovelytics/mcmuffin18-patch-7
cbartholomew2 Aug 15, 2024
5ba7c51
Add files via upload
mcmuffin18 Aug 20, 2024
7c47bd7
Merge pull request #34 from Lovelytics/mcmuffin18-patch-7
cbartholomew2 Aug 20, 2024
0484a69
Add files via upload
cbartholomew2 Aug 27, 2024
d6c8591
Update default_jobs_cluster_aws.json
cbartholomew2 Aug 28, 2024
5e04efc
Update default_jobs_cluster_aws_hipaa.json
cbartholomew2 Aug 28, 2024
a1d85d6
Update nitro_mapping.csv
cbartholomew2 Aug 28, 2024
258f4ca
Update HMS_Modification_Get_Database.py
mcmuffin18 Sep 3, 2024
1784ab4
Merge pull request #35 from Lovelytics/mcmuffin18-patch-8
cbartholomew2 Sep 3, 2024
1b73e7b
added use-logs flag, pagination, and concurrent futures
jsparhamii Jun 26, 2025
f8be476
added scim_client to workspaceclietn for retrieving users
jsparhamii Jun 26, 2025
9167972
added results=None param to get active users in ScimClient
jsparhamii Jul 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,6 @@ dist/
.tox/
databricks_migration_tool.egg-info
migrate.iml
export_dir/
unversioned/

Binary file added Root Hive Migration.dbc
Binary file not shown.
43 changes: 43 additions & 0 deletions Workspace Sizing Notebook.html

Large diffs are not rendered by default.

975 changes: 975 additions & 0 deletions WorkspaceClient_modified.py

Large diffs are not rendered by default.

117 changes: 117 additions & 0 deletions convert_all_logs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
###################### importing other scripts ##############################################
from utils import to_csv as util
from utils import create_asset_mapping_spreadsheet as create_spreadsheet
############################################################################################
import argparse
import os

def main(checkpoint, destination="csv"):
# where you want the csv files to be located
# make the csv directory if its not there
if destination not in os.listdir():
print(f"Creating {destination}...")
os.mkdir(f"./{destination}")

# users
users_data = util.read_log("users.log", checkpoint)
if users_data == 1:
print("users.log not found in checkpoint session")
else:
users_df = util.create_users(users_data)
util.save_to_csv(users_df, "users.csv", destination)

# instance profiles
ip_data = util.read_log("instance_profiles.log", checkpoint)
if ip_data == 1: # file not found
print("instance_profiles.log not found in checkpoint session. Skipping...")
else:
ip_df = util.create_instance_profiles(ip_data)
util.save_to_csv(ip_df, "instance_profiles.csv", destination)

# instance pools
ipo_data = util.read_log("instance_pools.log", checkpoint)
if ipo_data == 1: #file not found
print("instance_pools.log not found in checkpoint session. Skipping...")
else:
ipo_df = util.create_instance_pools(ipo_data)
util.save_to_csv(ipo_df, "instance_pools.csv", destination)

# groups
groups_df = util.create_groups("groups", checkpoint)
if groups_df == 1:
print("groups.log not found in checkpoint session. Skipping...")
util.save_to_csv(groups_df, "groups.csv", destination)

# clusters
clusters_data = util.read_log("clusters.log", checkpoint)
if clusters_data ==1 : #file not found
print("clusters.log not found in checkpoint session. Skipping... ")
else:
clusters_df = util.create_clusters(clusters_data)
util.save_to_csv(clusters_df, "clusters.csv", destination)

# cluster policies
cluster_policies_data = util.read_log('cluster_policies.log', checkpoint)
if cluster_policies_data == 1: #file not found
print("cluster_policies.log not found in checkpoint session. Skipping... ")
else:
clusters_policies_df = util.create_cluster_policies(cluster_policies_data)
util.save_to_csv(clusters_policies_df, "cluster_policies.csv", destination)

# job
jobs_data = util.read_log('jobs.log', checkpoint)
if jobs_data == 1: #file not found
print("jobs.log not found in checkpoint session. Skipping... ")
else:
jobs_acls = util.read_log('acl_jobs.log', checkpoint)
jobs_df = util.create_jobs(jobs_data, jobs_acls)
util.save_to_csv(jobs_df, "jobs.csv", destination)

# shared
shared_df = util.create_shared_logs("artifacts/Shared", checkpoint)
if shared_df == 1: #file not found
print("Shared notebooks not found in checkpoint session. Skipping... ")
util.save_to_csv(shared_df, 'global_shared_logs.csv', destination)

# other artificats
other_df = util.create_other_artifacts("artifacts", checkpoint)
if other_df == 1: #file not found
print("Global artifacts not found in checkpoint session. Skipping... ")
util.save_to_csv(other_df, "global_logs.csv", destination)

# libraries
libraries_data = util.read_log("libraries.log", checkpoint)
if libraries_data == 1: # not found
print("libraries.log not found in checkpoint session. Skipping...")
else:
libraries_df = util.create_libraries(libraries_data)
util.save_to_csv(libraries_df, "libraries.csv", destination)

# secret scopes
scopes_df = util.create_scopes("secret_scopes", checkpoint)
if scopes_df == 1:
print("secret_scopes.log not found in checkpoint session. Skipping...")
util.save_to_csv(scopes_df, "secret_scopes.csv", destination)

# just databases
databases_df = util.create_database(checkpoint, directory_name = 'metastore')
if databases_df == 1:
print("metastore.log not found in checkpoint session. Skipping...")
util.save_to_csv(databases_df, "databases.csv", destination)

# entire metastore
metastore_df = util.create_metastore(checkpoint, directory_name = 'metastore')
if metastore_df == 1:
print("metastore.log not found in checkpoint session. Skipping...")
util.save_to_csv(metastore_df, "metastore.csv", destination)

create_spreadsheet.csv_to_excel(f"./{destination}")
print("Successfully created spreadsheet asset_mapping.xlsx. ")

if __name__ == "__main__":
all_args = argparse.ArgumentParser()
all_args.add_argument("--checkpoint", "--session", dest="checkpoint", default="", help="set if you are using a checkpoint during export")
all_args.add_argument("--destination", dest="destination", default="csv", help="destination of converted logs (default: /csv)")

args = all_args.parse_args()
main(args.checkpoint, args.destination)
2 changes: 1 addition & 1 deletion data/aws_cluster.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"num_workers": 1,
"cluster_name": "Workspace_Migration_Work_Leave_Me_Alone",
"cluster_name": "E2_Migration",
"spark_version": "10.4.x-scala2.12",
"aws_attributes": {
"first_on_demand": 1,
Expand Down
2 changes: 1 addition & 1 deletion data/aws_cluster_hipaa.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"num_workers": 1,
"cluster_name": "Workspace_Migration_Work_Leave_Me_Alone",
"cluster_name": "E2_Migration",
"spark_version": "10.4.x-scala2.12",
"aws_attributes": {
"first_on_demand": 1,
Expand Down
8 changes: 2 additions & 6 deletions data/aws_cluster_table_acls.json
Original file line number Diff line number Diff line change
@@ -1,13 +1,9 @@
{
"num_workers": 1,
"cluster_name": "API_Table_ACL_Work_Leave_Me_Alone",
"cluster_name": "E2_Migration_Table_ACLs",
"spark_version": "10.4.x-scala2.12",
"spark_conf": {
"spark.databricks.cluster.profile": "serverless",
"spark.databricks.repl.allowedLanguages": "python,sql",
"spark.databricks.acl.dfAclsEnabled": "true",
"spark.sql.hive.metastore.version": "1.2.1",
"spark.sql.hive.metastore.jars": "maven"
"spark.databricks.acl.dfAclsEnabled": "true"
},
"aws_attributes": {
"first_on_demand": 1,
Expand Down
2 changes: 1 addition & 1 deletion data/aws_cluster_table_acls_hipaa.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"num_workers": 1,
"cluster_name": "API_Table_ACL_Work_Leave_Me_Alone",
"cluster_name": "E2_Migration_Table_ACLs",
"spark_version": "10.4.x-scala2.12",
"spark_conf": {
"spark.databricks.cluster.profile": "serverless",
Expand Down
2 changes: 1 addition & 1 deletion data/azure_cluster.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"num_workers": 1,
"cluster_name": "API_Metastore_Work_Leave_Me_Alone",
"cluster_name": "E2_Migration",
"spark_version": "10.4.x-scala2.12",
"spark_conf": {},
"node_type_id": "Standard_D8_v3",
Expand Down
2 changes: 1 addition & 1 deletion data/azure_cluster_table_acls.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"num_workers": 1,
"cluster_name": "API_Table_ACL_Work_Leave_Me_Alone",
"cluster_name": "E2_Migration_Table_ACLs",
"spark_version": "10.4.x-scala2.12",
"spark_conf": {
"spark.databricks.cluster.profile": "serverless",
Expand Down
2 changes: 1 addition & 1 deletion data/default_jobs_cluster_aws.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"num_workers": 8,
"spark_version": "7.3.x-scala2.12",
"spark_version": "14.3.x-scala2.12",
"node_type_id": "i3.xlarge",
"spark_env_vars": {
"PYSPARK_PYTHON": "/databricks/python3/bin/python3"
Expand Down
2 changes: 1 addition & 1 deletion data/default_jobs_cluster_aws_hipaa.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"num_workers": 8,
"spark_version": "7.3.x-scala2.12",
"spark_version": "14.3.x-scala2.12",
"node_type_id": "i4i.xlarge",
"spark_env_vars": {
"PYSPARK_PYTHON": "/databricks/python3/bin/python3"
Expand Down
2 changes: 1 addition & 1 deletion data/gcp_cluster.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"num_workers": 1,
"cluster_name": "Workspace_Migration_Work_Leave_Me_Alone",
"cluster_name": "E2_Migration",
"spark_version": "10.4.x-scala2.12",
"gcp_attributes": {
"first_on_demand": 1
Expand Down
2 changes: 1 addition & 1 deletion data/gcp_cluster_table_acls.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"cluster_name": "API_Table_ACL_Work_Leave_Me_Alone",
"cluster_name": "E2_Migration_Table_ACLs",
"spark_version": "10.4.x-scala2.12",
"spark_conf": {
"spark.databricks.cluster.profile": "serverless",
Expand Down
Loading