Skip to content

Commit c3368ba

Browse files
Improve STS Export to Assessment to Excel (#4375)
## Changes This notebook is heavily used by the STS Team and this change alleviates the need of sending a custom notebook for the export. Refactored assessment UI export functionality EXPORT_ASSESSMENT_TO_EXCEL to improve compatibility and remove compute constraints. Updated documentation for the assessment-export cli command ### Linked issues Not reported, but the Notebook failed as the lsql library returned empty queries. ### Functionality - [x] Refactored `EXPORT_ASSESSMENT_TO_EXCEL` code into `AssessmentExporter` class - [x] Enabled assessment execution from RuntimeContext - [x] Removed DBFS dependency for export file generation - [x] Eliminated lsql library dependency that was causing notebook execution failures - [x] Expanded `databricks labs ucx export-assessment` functionality to handle Excel export with the flag `--export-format excel.` ### Technical Details **Before:** The export functionality was tightly coupled to personal compute environments due to DBFS dependencies and lsql library requirements, limiting where the notebook could be executed. **After:** - Export logic is now encapsulated in the `AssessmentExporter` class - Removed DBFS dependency, allowing the notebook to run on any compute type including serverless - Eliminated lsql library dependency that was preventing successful execution - Assessment can now be executed directly from Runtime context ### Tests <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] manually tested - [x] added unit tests - [x] added integration tests - [x] verified on local environment (screenshot attached) Execution from Workspace <img width="1730" height="859" alt="export_to_excel_notebook" src="https://github.com/user-attachments/assets/b31a79a1-6700-4fdf-b499-dc70cf1d260e" /> Execution from CLI https://github.com/user-attachments/assets/c1485687-4e40-4618-a048-742ec3ad0738
1 parent 7b8d419 commit c3368ba

File tree

9 files changed

+338
-121
lines changed

9 files changed

+338
-121
lines changed

docs/ucx/docs/reference/commands/index.mdx

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -864,9 +864,9 @@ databricks labs ucx report-account-compatibility --profile labs-azure-account
864864
### `export-assessment`
865865

866866
```commandline
867-
databricks labs ucx export-assessment
867+
databricks labs ucx export-assessment [--export-format excel]
868868
```
869-
The export-assessment command is used to export UCX assessment results to a specified location. When you run this command, you will be prompted to provide details on the destination path and the type of report you wish to generate. If you do not specify these details, the command will default to exporting the main results to the current directory. The exported file will be named based on the selection made in the format. Eg: `export_{query_choice}_results.zip`
869+
The export-assessment command is used to export UCX assessment results to a specified location. When you run this command without any arguments, you will be prompted to provide details on the destination path and the type of report you wish to generate. By default, this will create a zip file containing CSV files in the format `export_{query_choice}_results.zip`.
870870
- **Choose a path to save the UCX Assessment results:**
871871
- **Description:** Specify the path where the results should be saved. If not provided, results will be saved in the current directory.
872872

@@ -877,3 +877,26 @@ The export-assessment command is used to export UCX assessment results to a spec
877877
- `interactive`
878878
- `main`
879879
- **Default:** `main`
880+
881+
```text
882+
databricks labs ucx export-assessment
883+
Choose a path to save the UCX Assessment results (default: /Users/user.name/ucx):
884+
Choose which assessment results to export
885+
[0] azure
886+
[1] estimates
887+
[2] interactive
888+
[3] main
889+
Enter a number between 0 and 3:
890+
```
891+
892+
Alternatively, you can run the command with argument `--export-format excel`, which will prompt for the destination to store the file and generate an Excel (.xlsx) format file named `ucx_assessment_main.xlsx`.
893+
- **Choose a path to save the UCX Assessment results:**
894+
- **Description:** Specify the path where the results should be saved. If not provided, results will be saved in the current directory.
895+
896+
```text
897+
databricks labs ucx export-assessment --export-format excel
898+
Choose a path to save the UCX Assessment results (default: /Users/user.name/ucx):
899+
```
900+
901+
**Note:**
902+
Both export commands need compute resources (either a Warehouse or Job cluster) to run and typically take about 10 minutes to complete.

labs.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -365,6 +365,9 @@ commands:
365365

366366
- name: export-assessment
367367
description: Export UCX results to a specified location
368+
flags:
369+
- name: export-format
370+
description: Specifies the file format for data export (ZIP with CSV files or Excel)
368371

369372
- name: create-federated-catalog
370373
description: (EXPERIMENTAL) Create a federated catalog in the workspace
Lines changed: 110 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,141 @@
1+
import base64
12
import logging
3+
from datetime import timedelta
24
from pathlib import Path
5+
from typing import Any
36

4-
from databricks.labs.blueprint.tui import Prompts
7+
from databricks.sdk.service import compute, jobs
8+
from databricks.sdk.service.jobs import RunResultState
9+
from databricks.sdk.service.workspace import ExportFormat
10+
from databricks.sdk.errors import NotFound, ResourceDoesNotExist
11+
from databricks.sdk.retries import retried
12+
from databricks.sdk import WorkspaceClient
513

6-
from databricks.labs.ucx.config import WorkspaceConfig
14+
from databricks.labs.blueprint.installation import Installation
15+
from databricks.labs.blueprint.tui import Prompts
716
from databricks.labs.lsql.backends import SqlBackend
817
from databricks.labs.lsql.dashboards import DashboardMetadata
918

19+
from databricks.labs.ucx.config import WorkspaceConfig
20+
from databricks.labs.ucx.assessment.export_html_template import EXPORT_HTML_TEMPLATE
21+
1022
logger = logging.getLogger(__name__)
1123

1224

1325
class AssessmentExporter:
1426

15-
def __init__(self, sql_backend: SqlBackend, config: WorkspaceConfig):
27+
def __init__(self, ws: WorkspaceClient, sql_backend: SqlBackend, config: WorkspaceConfig):
28+
self._ws = ws
1629
self._sql_backend = sql_backend
1730
self._config = config
31+
self._install_folder = f"/Workspace/{Installation.assume_global(ws, 'ucx')}/"
32+
self._base_path = Path(__file__).resolve().parents[3] / "labs/ucx/queries/assessment"
1833

19-
def export_results(self, prompts: Prompts):
20-
"""Main method to export results to CSV files inside a ZIP archive."""
21-
project_root = Path(__file__).resolve().parents[3]
22-
queries_path_root = project_root / "labs/ucx/queries/assessment"
34+
@staticmethod
35+
def _export_to_excel(
36+
assessment_metadata: DashboardMetadata, sql_backend: SqlBackend, export_path: Path, writter: Any
37+
):
38+
"""Export Assessment to Excel"""
39+
with writter.ExcelWriter(export_path, engine='xlsxwriter') as writer:
40+
for tile in assessment_metadata.tiles:
41+
if not tile.metadata.is_query():
42+
continue
43+
44+
try:
45+
rows = list(sql_backend.fetch(tile.content))
46+
if not rows:
47+
continue
2348

24-
results_directory = Path(
49+
data = [row.asDict() for row in rows]
50+
df = writter.DataFrame(data)
51+
52+
sheet_name = str(tile.metadata.id)[:31]
53+
df.to_excel(writer, sheet_name=sheet_name, index=False)
54+
55+
except NotFound as e:
56+
msg = (
57+
str(e).split(" Verify", maxsplit=1)[0] + f" Export will continue without {tile.metadata.title}"
58+
)
59+
logging.warning(msg)
60+
continue
61+
62+
@retried(on=[ResourceDoesNotExist], timeout=timedelta(minutes=1))
63+
def _render_export(self, export_file_path: Path) -> str:
64+
"""Render an HTML link for downloading the results."""
65+
binary_data = self._ws.workspace.download(export_file_path.as_posix()).read()
66+
b64_data = base64.b64encode(binary_data).decode('utf-8')
67+
68+
return EXPORT_HTML_TEMPLATE.format(b64_data=b64_data, export_file_path_name=export_file_path.name)
69+
70+
@staticmethod
71+
def _get_output_directory(prompts: Prompts) -> Path:
72+
return Path(
2573
prompts.question(
2674
"Choose a path to save the UCX Assessment results",
2775
default=Path.cwd().as_posix(),
2876
validate=lambda p_: Path(p_).exists(),
2977
)
3078
)
3179

80+
def _get_queries(self, assessment: str) -> DashboardMetadata:
81+
"""Get UCX queries to export"""
82+
queries_path = self._base_path / assessment if assessment else self._base_path
83+
return DashboardMetadata.from_path(queries_path).replace_database(
84+
database=self._config.inventory_database, database_to_replace="inventory"
85+
)
86+
87+
def cli_export_csv_results(self, prompts: Prompts) -> Path:
88+
"""Main method to export results to CSV files inside a ZIP archive."""
89+
results_directory = self._get_output_directory(prompts)
90+
3291
query_choice = prompts.choice(
3392
"Choose which assessment results to export",
34-
[subdir.name for subdir in queries_path_root.iterdir() if subdir.is_dir()],
93+
[subdir.name for subdir in self._base_path.iterdir() if subdir.is_dir()],
3594
)
3695

37-
export_path = results_directory / f"export_{query_choice}_results.zip"
38-
queries_path = queries_path_root / query_choice
96+
results_path = self._get_queries(query_choice).export_to_zipped_csv(
97+
self._sql_backend, results_directory / f"export_{query_choice}_results.zip"
98+
)
3999

40-
assessment_results = DashboardMetadata.from_path(queries_path).replace_database(
41-
database=self._config.inventory_database, database_to_replace="inventory"
100+
return results_path
101+
102+
def cli_export_xlsx_results(self, prompts: Prompts) -> Path:
103+
"""Submit Excel export notebook in a job"""
104+
105+
notebook_path = f"{self._install_folder}/EXPORT_ASSESSMENT_TO_EXCEL"
106+
export_file_name = Path(f"{self._install_folder}/ucx_assessment_main.xlsx")
107+
results_directory = Path(self._get_output_directory(prompts)) / export_file_name.name
108+
109+
run = self._ws.jobs.submit_and_wait(
110+
run_name="export-assessment-to-excel-experimental",
111+
tasks=[
112+
jobs.SubmitTask(
113+
notebook_task=jobs.NotebookTask(notebook_path=notebook_path),
114+
task_key="export-assessment",
115+
new_cluster=compute.ClusterSpec(
116+
data_security_mode=compute.DataSecurityMode.LEGACY_SINGLE_USER_STANDARD,
117+
spark_conf={
118+
"spark.databricks.cluster.profile": "singleNode",
119+
"spark.master": "local[*]",
120+
},
121+
custom_tags={"ResourceClass": "SingleNode"},
122+
num_workers=0,
123+
policy_id=self._config.policy_id,
124+
apply_policy_default_values=True,
125+
),
126+
)
127+
],
42128
)
43129

44-
logger.info("Exporting assessment results....")
45-
results_path = assessment_results.export_to_zipped_csv(self._sql_backend, export_path)
46-
logger.info(f"Results exported to {results_path}")
130+
if run.state and run.state.result_state == RunResultState.SUCCESS:
131+
binary_resp = self._ws.workspace.download(path=export_file_name.as_posix(), format=ExportFormat.SOURCE)
132+
results_directory.write_bytes(binary_resp.read())
47133

48-
return results_path
134+
return results_directory
135+
136+
def web_export_results(self, writer: Any) -> str:
137+
"""Alternative method to export results from the UI."""
138+
export_file_name = Path(f"{self._install_folder}/ucx_assessment_main.xlsx")
139+
assessment_main = self._get_queries("main")
140+
self._export_to_excel(assessment_main, self._sql_backend, export_file_name, writer)
141+
return self._render_export(export_file_name)
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
EXPORT_HTML_TEMPLATE = """
2+
<style>
3+
@font-face {{
4+
font-family: 'DM Sans';
5+
src: url(https://cdn.bfldr.com/9AYANS2F/at/p9qfs3vgsvnp5c7txz583vgs/dm-sans-regular.ttf?auto=webp&format=ttf) format('truetype');
6+
}}
7+
body {{ font-family: 'DM Sans', Arial, sans-serif; }}
8+
.export-container {{ text-align: center; margin-top: 20px; }}
9+
.export-container h2 {{ color: #1B3139; font-size: 24px; margin-bottom: 20px; }}
10+
.export-container button {{
11+
display: inline-block; padding: 12px 25px; background-color: #1B3139;
12+
color: #fff; border: none; border-radius: 4px; font-size: 18px;
13+
font-weight: 500; cursor: pointer; transition: background-color 0.3s, transform 0.3s;
14+
}}
15+
.export-container button:hover {{ background-color: #FF3621; transform: translateY(-2px); }}
16+
</style>
17+
18+
<div class="export-container">
19+
<h2>Export Results</h2>
20+
<button onclick="downloadExcel()">Download Results</button>
21+
</div>
22+
23+
<script>
24+
function downloadExcel() {{
25+
const b64Data = '{b64_data}';
26+
const filename = '{export_file_path_name}';
27+
28+
// Convert base64 to blob
29+
const byteCharacters = atob(b64Data);
30+
const byteNumbers = new Array(byteCharacters.length);
31+
for (let i = 0; i < byteCharacters.length; i++) {{
32+
byteNumbers[i] = byteCharacters.charCodeAt(i);
33+
}}
34+
const byteArray = new Uint8Array(byteNumbers);
35+
const blob = new Blob([byteArray], {{
36+
type: 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
37+
}});
38+
39+
// Create download link and click it
40+
const url = URL.createObjectURL(blob);
41+
const a = document.createElement('a');
42+
a.href = url;
43+
a.download = filename;
44+
a.click();
45+
URL.revokeObjectURL(url);
46+
}}
47+
</script>
48+
"""

src/databricks/labs/ucx/cli.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -914,11 +914,18 @@ def migrate_local_code(
914914

915915

916916
@ucx.command
917-
def export_assessment(w: WorkspaceClient, prompts: Prompts):
917+
def export_assessment(w: WorkspaceClient, prompts: Prompts, export_format: str = "csv"):
918918
"""Export the UCX assessment queries to a zip file."""
919+
results_path = Path()
919920
ctx = WorkspaceContext(w)
920921
exporter = ctx.assessment_exporter
921-
exporter.export_results(prompts)
922+
923+
export_method = (
924+
exporter.cli_export_xlsx_results if export_format.lower() == "excel" else exporter.cli_export_csv_results
925+
)
926+
927+
results_path = export_method(prompts)
928+
logger.info(f"Results exported to {results_path}")
922929

923930

924931
@ucx.command

src/databricks/labs/ucx/contexts/application.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -362,7 +362,7 @@ def tables_migrator(self) -> TablesMigrator:
362362

363363
@cached_property
364364
def assessment_exporter(self):
365-
return AssessmentExporter(self.sql_backend, self.config)
365+
return AssessmentExporter(self.workspace_client, self.sql_backend, self.config)
366366

367367
@cached_property
368368
def acl_migrator(self):

0 commit comments

Comments
 (0)