Skip to content

Commit c323321

Browse files
rportilla-databrickslmallepaddiandresgarciafnfxasnare
authored
Added export CLI functionality for assessment results (#2553)
## Changes * Update project to include `export` CLI function * Added notebook utility to run the export from the workspace environment * Added default location, unit tests, and integration test for notebook utility ### Tests - [X] added unit tests - [x] verified on staging environment (screenshot attached) --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: lmallepaddi <[email protected]> Co-authored-by: Andres Garcia <[email protected]> Co-authored-by: Serge Smertin <[email protected]> Co-authored-by: Laxmi Jyotsna Mallepaddi <[email protected]> Co-authored-by: Andrew Snare <[email protected]> Co-authored-by: Cor <[email protected]> Co-authored-by: Amin Movahed <[email protected]> Co-authored-by: Eric Vergnaud <[email protected]> Co-authored-by: Eric Vergnaud <[email protected]> Co-authored-by: Hari Selvarajan <[email protected]> Co-authored-by: Amin Movahed <[email protected]> Co-authored-by: Pritish Pai <[email protected]> Co-authored-by: Liran Bareket <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
1 parent 736fc12 commit c323321

File tree

8 files changed

+291
-1
lines changed

8 files changed

+291
-1
lines changed

README.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,10 @@ so that you'll be able to [scope the migration](docs/assessment.md) and execute
1010
The [README notebook](#readme-notebook), which can be found in the installation folder contains further instructions and explanations of the different ucx workflows & dashboards.
1111
Once the migration is scoped, you can start with the [table migration process](#Table-Migration).
1212

13+
1314
More workflows, like notebook code migration are coming in future releases.
1415

16+
1517
UCX also provides a number of command line utilities accessible via `databricks labs ucx`.
1618

1719
For questions, troubleshooting or bug fixes, please see our [troubleshooting guide](docs/troubleshooting.md) or submit [an issue](https://github.com/databrickslabs/ucx/issues).
@@ -90,6 +92,7 @@ See [contributing instructions](CONTRIBUTING.md) to help improve this project.
9092
* [`open-remote-config` command](#open-remote-config-command)
9193
* [`installations` command](#installations-command)
9294
* [`report-account-compatibility` command](#report-account-compatibility-command)
95+
* [`export-assessment` command](#export-assessment-command)
9396
* [Metastore related commands](#metastore-related-commands)
9497
* [`show-all-metastores` command](#show-all-metastores-command)
9598
* [`assign-metastore` command](#assign-metastore-command)
@@ -1167,6 +1170,42 @@ databricks labs ucx report-account-compatibility --profile labs-azure-account
11671170
12:56:21 INFO [d.l.u.account.aggregate] Non-DELTA format: UNKNOWN: 5 objects
11681171
```
11691172

1173+
[[back to top](#databricks-labs-ucx)]
1174+
## `export-assessment` command
1175+
1176+
```commandline
1177+
databricks labs ucx export-assessment
1178+
```
1179+
The export-assessment command is used to export UCX assessment results to a specified location. When you run this command, you will be prompted to provide details on the destination path and the type of report you wish to generate. If you do not specify these details, the command will default to exporting the main results to the current directory. The exported file will be named based on the selection made in the format. Eg: export_{query_choice}_results.zip
1180+
- **Choose a path to save the UCX Assessment results:**
1181+
- **Description:** Specify the path where the results should be saved. If not provided, results will be saved in the current directory.
1182+
1183+
- **Choose which assessment results to export:**
1184+
- **Description:** Select the type of results to export. Options include:
1185+
- `azure`
1186+
- `estimates`
1187+
- `interactive`
1188+
- `main`
1189+
- **Default:** `main`
1190+
1191+
[[back to top](#databricks-labs-ucx)]
1192+
1193+
## `export-assessment` command
1194+
1195+
```commandline
1196+
databricks labs ucx export-assessment
1197+
```
1198+
The export-assessment command is used to export UCX assessment results to a specified location. When you run this command, you will be prompted to provide details on the destination path and the type of report you wish to generate. If you do not specify these details, the command will default to exporting the main results to the current directory. The exported file will be named based on the selection made in the format. Eg: export_{query_choice}_results.zip
1199+
- **Choose a path to save the UCX Assessment results:**
1200+
- **Description:** Specify the path where the results should be saved. If not provided, results will be saved in the current directory.
1201+
1202+
- **Choose which assessment results to export:**
1203+
- **Description:** Select the type of results to export. Options include:
1204+
- `azure`
1205+
- `estimates`
1206+
- `interactive`
1207+
- `main`
1208+
11701209
[[back to top](#databricks-labs-ucx)]
11711210

11721211
# Metastore related commands

labs.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -330,3 +330,6 @@ commands:
330330
description: The file to download
331331
- name: run-as-collection
332332
description: Run the command for the collection of workspaces with ucx installed. Default is False.
333+
334+
- name: export-assessment
335+
description: Export UCX results to a specified location
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
import logging
2+
from pathlib import Path
3+
4+
from databricks.labs.blueprint.tui import Prompts
5+
6+
from databricks.labs.ucx.config import WorkspaceConfig
7+
from databricks.labs.lsql.backends import SqlBackend
8+
from databricks.labs.lsql.dashboards import DashboardMetadata
9+
10+
logger = logging.getLogger(__name__)
11+
12+
13+
class AssessmentExporter:
14+
15+
def __init__(self, sql_backend: SqlBackend, config: WorkspaceConfig):
16+
self._sql_backend = sql_backend
17+
self._config = config
18+
19+
def export_results(self, prompts: Prompts):
20+
"""Main method to export results to CSV files inside a ZIP archive."""
21+
project_root = Path(__file__).resolve().parents[3]
22+
queries_path_root = project_root / "labs/ucx/queries/assessment"
23+
24+
results_directory = Path(
25+
prompts.question(
26+
"Choose a path to save the UCX Assessment results",
27+
default=Path.cwd().as_posix(),
28+
validate=lambda p_: Path(p_).exists(),
29+
)
30+
)
31+
32+
query_choice = prompts.choice(
33+
"Choose which assessment results to export",
34+
[subdir.name for subdir in queries_path_root.iterdir() if subdir.is_dir()],
35+
)
36+
37+
export_path = results_directory / f"export_{query_choice}_results.zip"
38+
queries_path = queries_path_root / query_choice
39+
40+
assessment_results = DashboardMetadata.from_path(queries_path).replace_database(
41+
database=self._config.inventory_database, database_to_replace="inventory"
42+
)
43+
44+
logger.info("Exporting assessment results....")
45+
results_path = assessment_results.export_to_zipped_csv(self._sql_backend, export_path)
46+
logger.info(f"Results exported to {results_path}")
47+
48+
return results_path

src/databricks/labs/ucx/cli.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
from databricks.labs.ucx.install import AccountInstaller
2020
from databricks.labs.ucx.source_code.linters.files import LocalCodeLinter
2121

22+
2223
ucx = App(__file__)
2324
logger = get_logger(__file__)
2425

@@ -791,5 +792,13 @@ def lint_local_code(
791792
linter.lint(prompts, None if path is None else Path(path))
792793

793794

795+
@ucx.command
796+
def export_assessment(w: WorkspaceClient, prompts: Prompts):
797+
"""Export the UCX assessment queries to a zip file."""
798+
ctx = WorkspaceContext(w)
799+
exporter = ctx.assessment_exporter
800+
exporter.export_results(prompts)
801+
802+
794803
if __name__ == "__main__":
795804
ucx()

src/databricks/labs/ucx/contexts/application.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525

2626
from databricks.labs.ucx.account.workspaces import WorkspaceInfo
2727
from databricks.labs.ucx.assessment.azure import AzureServicePrincipalCrawler
28+
from databricks.labs.ucx.assessment.export import AssessmentExporter
2829
from databricks.labs.ucx.aws.credentials import CredentialManager
2930
from databricks.labs.ucx.config import WorkspaceConfig
3031
from databricks.labs.ucx.hive_metastore import ExternalLocations, Mounts, TablesCrawler
@@ -260,7 +261,11 @@ def tables_migrator(self) -> TablesMigrator:
260261
)
261262

262263
@cached_property
263-
def acl_migrator(self) -> ACLMigrator:
264+
def assessment_exporter(self):
265+
return AssessmentExporter(self.sql_backend, self.config)
266+
267+
@cached_property
268+
def acl_migrator(self):
264269
return ACLMigrator(
265270
self.tables_crawler,
266271
self.workspace_info,

src/databricks/labs/ucx/installer/workflows.py

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,123 @@
116116
f'--parent_run_id=' + dbutils.widgets.get('parent_run_id'))
117117
"""
118118

119+
EXPORT_TO_EXCEL_NOTEBOOK = """# Databricks notebook source
120+
# MAGIC %md
121+
# MAGIC ##### Exporter of UCX assessment results
122+
# MAGIC ##### Instructions:
123+
# MAGIC 1. Execute using an all-purpose cluster with Databricks Runtime 14 or higher.
124+
# MAGIC 1. Hit **Run all** button and wait for completion.
125+
# MAGIC 1. Go to the bottom of the notebook and click the Download UCX Results button.
126+
# MAGIC
127+
# MAGIC ##### Important:
128+
# MAGIC Please note that this is only meant to serve as example code.
129+
# MAGIC
130+
# MAGIC Example code developed by **Databricks Shared Technical Services team**.
131+
132+
# COMMAND ----------
133+
134+
# DBTITLE 1,Installing Packages
135+
# MAGIC %pip install {remote_wheel} -qqq
136+
# MAGIC %pip install xlsxwriter -qqq
137+
# MAGIC dbutils.library.restartPython()
138+
139+
# COMMAND ----------
140+
141+
# DBTITLE 1,Libraries Import and Setting UCX
142+
import os
143+
import logging
144+
import threading
145+
import shutil
146+
from pathlib import Path
147+
from threading import Lock
148+
from functools import partial
149+
150+
import pandas as pd
151+
import xlsxwriter
152+
153+
from databricks.sdk.config import with_user_agent_extra
154+
from databricks.labs.blueprint.logger import install_logger
155+
from databricks.labs.blueprint.parallel import Threads
156+
from databricks.labs.lsql.dashboards import Dashboards
157+
from databricks.labs.lsql.lakeview.model import Dataset
158+
from databricks.labs.ucx.contexts.workflow_task import RuntimeContext
159+
160+
# ctx
161+
install_logger()
162+
with_user_agent_extra("cmd", "export-assessment")
163+
named_parameters = dict(config="/Workspace{config_file}")
164+
ctx = RuntimeContext(named_parameters)
165+
lock = Lock()
166+
167+
# COMMAND ----------
168+
169+
# DBTITLE 1,Assessment Export
170+
FILE_NAME = "ucx_assessment_main.xlsx"
171+
TMP_PATH = f"/Workspace{{ctx.installation.install_folder()}}/tmp/"
172+
DOWNLOAD_PATH = "/dbfs/FileStore/excel-export"
173+
174+
175+
def _cleanup() -> None:
176+
'''Move the temporary results file to the download path and clean up the temp directory.'''
177+
shutil.move(
178+
os.path.join(TMP_PATH, FILE_NAME),
179+
os.path.join(DOWNLOAD_PATH, FILE_NAME),
180+
)
181+
shutil.rmtree(TMP_PATH)
182+
183+
184+
def _prepare_directories() -> None:
185+
'''Ensure that the necessary directories exist.'''
186+
os.makedirs(TMP_PATH, exist_ok=True)
187+
os.makedirs(DOWNLOAD_PATH, exist_ok=True)
188+
189+
190+
def _to_excel(dataset: Dataset, writer: ...) -> None:
191+
'''Execute a SQL query and write the result to an Excel sheet.'''
192+
worksheet_name = dataset.display_name[:31]
193+
df = spark.sql(dataset.query).toPandas()
194+
with lock:
195+
df.to_excel(writer, sheet_name=worksheet_name, index=False)
196+
197+
198+
def _render_export() -> None:
199+
'''Render an HTML link for downloading the results.'''
200+
html_content = '''
201+
<style>@font-face{{font-family:'DM Sans';src:url(https://cdn.bfldr.com/9AYANS2F/at/p9qfs3vgsvnp5c7txz583vgs/dm-sans-regular.ttf?auto=webp&format=ttf) format('truetype');font-weight:400;font-style:normal}}body{{font-family:'DM Sans',Arial,sans-serif}}.export-container{{text-align:center;margin-top:20px}}.export-container h2{{color:#1B3139;font-size:24px;margin-bottom:20px}}.export-container a{{display:inline-block;padding:12px 25px;background-color:#1B3139;color:#fff;text-decoration:none;border-radius:4px;font-size:18px;font-weight:500;transition:background-color 0.3s ease,transform:translateY(-2px) ease}}.export-container a:hover{{background-color:#FF3621;transform:translateY(-2px)}}</style>
202+
<div class="export-container"><h2>Export Results</h2><a href='{workspace_host}/files/excel-export/ucx_assessment_main.xlsx?o={workspace_id}' target='_blank' download>Download Results</a></div>
203+
204+
'''
205+
displayHTML(html_content)
206+
207+
208+
def export_results() -> None:
209+
'''Main method to export results to an Excel file.'''
210+
_prepare_directories()
211+
212+
dashboard_path = (
213+
Path(ctx.installation.install_folder())
214+
/ "dashboards/[UCX] UCX Assessment (Main).lvdash.json"
215+
)
216+
dashboard = Dashboards(ctx.workspace_client)
217+
dashboard_datasets = dashboard.get_dashboard(dashboard_path).datasets
218+
try:
219+
target = TMP_PATH + "/ucx_assessment_main.xlsx"
220+
with pd.ExcelWriter(target, engine="xlsxwriter") as writer:
221+
tasks = []
222+
for dataset in dashboard_datasets:
223+
tasks.append(partial(_to_excel, dataset, writer))
224+
Threads.strict("exporting", tasks)
225+
_cleanup()
226+
_render_export()
227+
except Exception as e:
228+
print(f"Error exporting results ", e)
229+
230+
# COMMAND ----------
231+
232+
# DBTITLE 1,Data Export
233+
export_results()
234+
"""
235+
119236

120237
class DeployedWorkflows:
121238
def __init__(self, ws: WorkspaceClient, install_state: InstallState):
@@ -502,6 +619,7 @@ def create_jobs(self) -> None:
502619
self.remove_jobs(keep=desired_workflows)
503620
self._install_state.save()
504621
self._create_debug(remote_wheels)
622+
self._create_export(remote_wheels)
505623
self._create_readme()
506624

507625
def remove_jobs(self, *, keep: set[str] | None = None) -> None:
@@ -840,6 +958,16 @@ def _create_debug(self, remote_wheels: list[str]):
840958
).encode("utf8")
841959
self._installation.upload('DEBUG.py', content)
842960

961+
def _create_export(self, remote_wheels: list[str]):
962+
remote_wheels_str = " ".join(remote_wheels)
963+
content = EXPORT_TO_EXCEL_NOTEBOOK.format(
964+
remote_wheel=remote_wheels_str,
965+
config_file=self._config_file,
966+
workspace_host=self._ws.config.host,
967+
workspace_id=self._ws.get_workspace_id(),
968+
).encode("utf8")
969+
self._installation.upload('EXPORT_ASSESSMENT_TO_EXCEL.py', content)
970+
843971

844972
class MaxedStreamHandler(logging.StreamHandler):
845973

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
from databricks.labs.ucx.config import WorkspaceConfig
2+
from databricks.labs.ucx.assessment.export import AssessmentExporter
3+
from databricks.labs.lsql.backends import MockBackend
4+
from databricks.labs.blueprint.tui import MockPrompts
5+
from databricks.labs.lsql.core import Row
6+
7+
8+
def test_export(tmp_path):
9+
"""Test the export_results method of the AssessmentExporter class."""
10+
query = {
11+
"SELECT\n one\nFROM ucx.external_locations": [
12+
Row(location="s3://bucket1/folder1", table_count=1),
13+
Row(location="abfss://[email protected]/folder1", table_count=1),
14+
Row(location="gcp://folder1", table_count=2),
15+
]
16+
}
17+
18+
# Setup workspace configuration
19+
config = WorkspaceConfig(inventory_database="ucx")
20+
21+
# Prepare temporary paths and files
22+
export_path = tmp_path / "export"
23+
export_path.mkdir(parents=True, exist_ok=True)
24+
25+
# Mock backend and prompts
26+
mock_backend = MockBackend(rows=query)
27+
query_choice = {"assessment_name": "main", "option": 3}
28+
mock_prompts = MockPrompts(
29+
{
30+
"Choose a path to save the UCX Assessment results": export_path.as_posix(),
31+
"Choose which assessment results to export": query_choice["option"],
32+
}
33+
)
34+
35+
# Execute export process
36+
export = AssessmentExporter(mock_backend, config)
37+
exported = export.export_results(mock_prompts)
38+
39+
# Assertion based on the query_choice
40+
expected_file_name = f"export_{query_choice['assessment_name']}_results.zip" # Adjusted filename
41+
assert exported == export_path / expected_file_name

tests/unit/test_cli.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@
6161
validate_groups_membership,
6262
workflows,
6363
delete_missing_principals,
64+
export_assessment,
6465
)
6566
from databricks.labs.ucx.contexts.account_cli import AccountContext
6667
from databricks.labs.ucx.contexts.workspace_cli import WorkspaceContext
@@ -1133,3 +1134,19 @@ def test_delete_principals(ws):
11331134
prompts = MockPrompts({"Select the list of roles *": "0"})
11341135
delete_missing_principals(ws, prompts, ctx)
11351136
role_creation.delete_uc_roles.assert_called_once()
1137+
1138+
1139+
def test_export_assessment(ws, tmp_path):
1140+
query_choice = {"assessment_name": "main", "option": 3}
1141+
mock_prompts = MockPrompts(
1142+
{
1143+
"Choose a path to save the UCX Assessment results": tmp_path.as_posix(),
1144+
"Choose which assessment results to export": query_choice["option"],
1145+
}
1146+
)
1147+
1148+
export_assessment(ws, mock_prompts)
1149+
# Construct the expected filename based on the query_choice
1150+
expected_filename = f"export_{query_choice['assessment_name']}_results.zip"
1151+
# Assert that the file exists in the temporary path
1152+
assert len(list(tmp_path.glob(expected_filename))) == 1

0 commit comments

Comments
 (0)