Improve STS Export to Assessment to Excel (#4375)

andresgarciaf · web-flow · commit c3368baa63c4 · 2025-08-27T14:47:27.000Z
## Changes This notebook is heavily used by the STS Team and this change alleviates the need of sending a custom notebook for the export. Refactored assessment UI export functionality EXPORT_ASSESSMENT_TO_EXCEL to improve compatibility and remove compute constraints. Updated documentation for the assessment-export cli command ### Linked issues Not reported, but the Notebook failed as the lsql library returned empty queries. ### Functionality - [x] Refactored `EXPORT_ASSESSMENT_TO_EXCEL` code into `AssessmentExporter` class - [x] Enabled assessment execution from RuntimeContext - [x] Removed DBFS dependency for export file generation - [x] Eliminated lsql library dependency that was causing notebook execution failures - [x] Expanded `databricks labs ucx export-assessment` functionality to handle Excel export with the flag `--export-format excel.` ### Technical Details **Before:** The export functionality was tightly coupled to personal compute environments due to DBFS dependencies and lsql library requirements, limiting where the notebook could be executed. **After:** - Export logic is now encapsulated in the `AssessmentExporter` class - Removed DBFS dependency, allowing the notebook to run on any compute type including serverless - Eliminated lsql library dependency that was preventing successful execution - Assessment can now be executed directly from Runtime context ### Tests  - [x] manually tested - [x] added unit tests - [x] added integration tests - [x] verified on local environment (screenshot attached) Execution from Workspace <img width="1730" height="859" alt="export_to_excel_notebook" src="https://github.com/user-attachments/assets/b31a79a1-6700-4fdf-b499-dc70cf1d260e" /> Execution from CLI https://github.com/user-attachments/assets/c1485687-4e40-4618-a048-742ec3ad0738
diff --git a/docs/ucx/docs/reference/commands/index.mdx b/docs/ucx/docs/reference/commands/index.mdx
@@ -864,9 +864,9 @@ databricks labs ucx report-account-compatibility --profile labs-azure-account
 ### `export-assessment`
 
 ```commandline
-databricks labs ucx export-assessment
+databricks labs ucx export-assessment [--export-format excel]
 ```
-The export-assessment command is used to export UCX assessment results to a specified location. When you run this command, you will be prompted to provide details on the destination path and the type of report you wish to generate. If you do not specify these details, the command will default to exporting the main results to the current directory. The exported file will be named based on the selection made in the format. Eg: `export_{query_choice}_results.zip`
+The export-assessment command is used to export UCX assessment results to a specified location. When you run this command without any arguments, you will be prompted to provide details on the destination path and the type of report you wish to generate. By default, this will create a zip file containing CSV files in the format `export_{query_choice}_results.zip`.
 - **Choose a path to save the UCX Assessment results:**
     - **Description:** Specify the path where the results should be saved. If not provided, results will be saved in the current directory.
 
@@ -877,3 +877,26 @@ The export-assessment command is used to export UCX assessment results to a spec
         - `interactive`
         - `main`
     - **Default:** `main`
+
+```text
+databricks labs ucx export-assessment
+Choose a path to save the UCX Assessment results (default: /Users/user.name/ucx):
+Choose which assessment results to export
+[0] azure
+[1] estimates
+[2] interactive
+[3] main
+Enter a number between 0 and 3:
+```
+
+Alternatively, you can run the command with argument `--export-format excel`, which will prompt for the destination to store the file and generate an Excel (.xlsx) format file named `ucx_assessment_main.xlsx`.
+- **Choose a path to save the UCX Assessment results:**
+    - **Description:** Specify the path where the results should be saved. If not provided, results will be saved in the current directory.
+
+```text
+databricks labs ucx export-assessment --export-format excel
+Choose a path to save the UCX Assessment results (default: /Users/user.name/ucx):
+```
+
+**Note:**
+Both export commands need compute resources (either a Warehouse or Job cluster) to run and typically take about 10 minutes to complete.
diff --git a/labs.yml b/labs.yml
@@ -365,6 +365,9 @@ commands:
 
   - name: export-assessment
     description: Export UCX results to a specified location
+    flags:
+      - name: export-format
+        description: Specifies the file format for data export (ZIP with CSV files or Excel)
 
   - name: create-federated-catalog
     description: (EXPERIMENTAL) Create a federated catalog in the workspace
diff --git a/src/databricks/labs/ucx/assessment/export.py b/src/databricks/labs/ucx/assessment/export.py
@@ -1,48 +1,141 @@
+import base64
 import logging
+from datetime import timedelta
 from pathlib import Path
+from typing import Any
 
-from databricks.labs.blueprint.tui import Prompts
+from databricks.sdk.service import compute, jobs
+from databricks.sdk.service.jobs import RunResultState
+from databricks.sdk.service.workspace import ExportFormat
+from databricks.sdk.errors import NotFound, ResourceDoesNotExist
+from databricks.sdk.retries import retried
+from databricks.sdk import WorkspaceClient
 
-from databricks.labs.ucx.config import WorkspaceConfig
+from databricks.labs.blueprint.installation import Installation
+from databricks.labs.blueprint.tui import Prompts
 from databricks.labs.lsql.backends import SqlBackend
 from databricks.labs.lsql.dashboards import DashboardMetadata
 
+from databricks.labs.ucx.config import WorkspaceConfig
+from databricks.labs.ucx.assessment.export_html_template import EXPORT_HTML_TEMPLATE
+
 logger = logging.getLogger(__name__)
 
 
 class AssessmentExporter:
 
-    def __init__(self, sql_backend: SqlBackend, config: WorkspaceConfig):
+    def __init__(self, ws: WorkspaceClient, sql_backend: SqlBackend, config: WorkspaceConfig):
+        self._ws = ws
         self._sql_backend = sql_backend
         self._config = config
+        self._install_folder = f"/Workspace/{Installation.assume_global(ws, 'ucx')}/"
+        self._base_path = Path(__file__).resolve().parents[3] / "labs/ucx/queries/assessment"
 
-    def export_results(self, prompts: Prompts):
-        """Main method to export results to CSV files inside a ZIP archive."""
-        project_root = Path(__file__).resolve().parents[3]
-        queries_path_root = project_root / "labs/ucx/queries/assessment"
+    @staticmethod
+    def _export_to_excel(
+        assessment_metadata: DashboardMetadata, sql_backend: SqlBackend, export_path: Path, writter: Any
+    ):
+        """Export Assessment to Excel"""
+        with writter.ExcelWriter(export_path, engine='xlsxwriter') as writer:
+            for tile in assessment_metadata.tiles:
+                if not tile.metadata.is_query():
+                    continue
+
+                try:
+                    rows = list(sql_backend.fetch(tile.content))
+                    if not rows:
+                        continue
 
-        results_directory = Path(
+                    data = [row.asDict() for row in rows]
+                    df = writter.DataFrame(data)
+
+                    sheet_name = str(tile.metadata.id)[:31]
+                    df.to_excel(writer, sheet_name=sheet_name, index=False)
+
+                except NotFound as e:
+                    msg = (
+                        str(e).split(" Verify", maxsplit=1)[0] + f" Export will continue without {tile.metadata.title}"
+                    )
+                    logging.warning(msg)
+                    continue
+
+    @retried(on=[ResourceDoesNotExist], timeout=timedelta(minutes=1))
+    def _render_export(self, export_file_path: Path) -> str:
+        """Render an HTML link for downloading the results."""
+        binary_data = self._ws.workspace.download(export_file_path.as_posix()).read()
+        b64_data = base64.b64encode(binary_data).decode('utf-8')
+
+        return EXPORT_HTML_TEMPLATE.format(b64_data=b64_data, export_file_path_name=export_file_path.name)
+
+    @staticmethod
+    def _get_output_directory(prompts: Prompts) -> Path:
+        return Path(
             prompts.question(
                 "Choose a path to save the UCX Assessment results",
                 default=Path.cwd().as_posix(),
                 validate=lambda p_: Path(p_).exists(),
             )
         )
 
+    def _get_queries(self, assessment: str) -> DashboardMetadata:
+        """Get UCX queries to export"""
+        queries_path = self._base_path / assessment if assessment else self._base_path
+        return DashboardMetadata.from_path(queries_path).replace_database(
+            database=self._config.inventory_database, database_to_replace="inventory"
+        )
+
+    def cli_export_csv_results(self, prompts: Prompts) -> Path:
+        """Main method to export results to CSV files inside a ZIP archive."""
+        results_directory = self._get_output_directory(prompts)
+
         query_choice = prompts.choice(
             "Choose which assessment results to export",
-            [subdir.name for subdir in queries_path_root.iterdir() if subdir.is_dir()],
+            [subdir.name for subdir in self._base_path.iterdir() if subdir.is_dir()],
         )
 
-        export_path = results_directory / f"export_{query_choice}_results.zip"
-        queries_path = queries_path_root / query_choice
+        results_path = self._get_queries(query_choice).export_to_zipped_csv(
+            self._sql_backend, results_directory / f"export_{query_choice}_results.zip"
+        )
 
-        assessment_results = DashboardMetadata.from_path(queries_path).replace_database(
-            database=self._config.inventory_database, database_to_replace="inventory"
+        return results_path
+
+    def cli_export_xlsx_results(self, prompts: Prompts) -> Path:
+        """Submit Excel export notebook in a job"""
+
+        notebook_path = f"{self._install_folder}/EXPORT_ASSESSMENT_TO_EXCEL"
+        export_file_name = Path(f"{self._install_folder}/ucx_assessment_main.xlsx")
+        results_directory = Path(self._get_output_directory(prompts)) / export_file_name.name
+
+        run = self._ws.jobs.submit_and_wait(
+            run_name="export-assessment-to-excel-experimental",
+            tasks=[
+                jobs.SubmitTask(
+                    notebook_task=jobs.NotebookTask(notebook_path=notebook_path),
+                    task_key="export-assessment",
+                    new_cluster=compute.ClusterSpec(
+                        data_security_mode=compute.DataSecurityMode.LEGACY_SINGLE_USER_STANDARD,
+                        spark_conf={
+                            "spark.databricks.cluster.profile": "singleNode",
+                            "spark.master": "local[*]",
+                        },
+                        custom_tags={"ResourceClass": "SingleNode"},
+                        num_workers=0,
+                        policy_id=self._config.policy_id,
+                        apply_policy_default_values=True,
+                    ),
+                )
+            ],
         )
 
-        logger.info("Exporting assessment results....")
-        results_path = assessment_results.export_to_zipped_csv(self._sql_backend, export_path)
-        logger.info(f"Results exported to {results_path}")
+        if run.state and run.state.result_state == RunResultState.SUCCESS:
+            binary_resp = self._ws.workspace.download(path=export_file_name.as_posix(), format=ExportFormat.SOURCE)
+            results_directory.write_bytes(binary_resp.read())
 
-        return results_path
+        return results_directory
+
+    def web_export_results(self, writer: Any) -> str:
+        """Alternative method to export results from the UI."""
+        export_file_name = Path(f"{self._install_folder}/ucx_assessment_main.xlsx")
+        assessment_main = self._get_queries("main")
+        self._export_to_excel(assessment_main, self._sql_backend, export_file_name, writer)
+        return self._render_export(export_file_name)
diff --git a/src/databricks/labs/ucx/assessment/export_html_template.py b/src/databricks/labs/ucx/assessment/export_html_template.py
@@ -0,0 +1,48 @@
+EXPORT_HTML_TEMPLATE = """
+<style>
+    @font-face {{
+        font-family: 'DM Sans';
+        src: url(https://cdn.bfldr.com/9AYANS2F/at/p9qfs3vgsvnp5c7txz583vgs/dm-sans-regular.ttf?auto=webp&format=ttf) format('truetype');
+    }}
+    body {{ font-family: 'DM Sans', Arial, sans-serif; }}
+    .export-container {{ text-align: center; margin-top: 20px; }}
+    .export-container h2 {{ color: #1B3139; font-size: 24px; margin-bottom: 20px; }}
+    .export-container button {{
+        display: inline-block; padding: 12px 25px; background-color: #1B3139;
+        color: #fff; border: none; border-radius: 4px; font-size: 18px;
+        font-weight: 500; cursor: pointer; transition: background-color 0.3s, transform 0.3s;
+    }}
+    .export-container button:hover {{ background-color: #FF3621; transform: translateY(-2px); }}
+</style>
+
+<div class="export-container">
+    <h2>Export Results</h2>
+    <button onclick="downloadExcel()">Download Results</button>
+</div>
+
+<script>
+    function downloadExcel() {{
+        const b64Data = '{b64_data}';
+        const filename = '{export_file_path_name}';
+
+        // Convert base64 to blob
+        const byteCharacters = atob(b64Data);
+        const byteNumbers = new Array(byteCharacters.length);
+        for (let i = 0; i < byteCharacters.length; i++) {{
+            byteNumbers[i] = byteCharacters.charCodeAt(i);
+        }}
+        const byteArray = new Uint8Array(byteNumbers);
+        const blob = new Blob([byteArray], {{
+            type: 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
+        }});
+
+        // Create download link and click it
+        const url = URL.createObjectURL(blob);
+        const a = document.createElement('a');
+        a.href = url;
+        a.download = filename;
+        a.click();
+        URL.revokeObjectURL(url);
+    }}
+</script>
+"""
diff --git a/src/databricks/labs/ucx/cli.py b/src/databricks/labs/ucx/cli.py
@@ -914,11 +914,18 @@ def migrate_local_code(
 
 
 @ucx.command
-def export_assessment(w: WorkspaceClient, prompts: Prompts):
+def export_assessment(w: WorkspaceClient, prompts: Prompts, export_format: str = "csv"):
     """Export the UCX assessment queries to a zip file."""
+    results_path = Path()
     ctx = WorkspaceContext(w)
     exporter = ctx.assessment_exporter
-    exporter.export_results(prompts)
+
+    export_method = (
+        exporter.cli_export_xlsx_results if export_format.lower() == "excel" else exporter.cli_export_csv_results
+    )
+
+    results_path = export_method(prompts)
+    logger.info(f"Results exported to {results_path}")
 
 
 @ucx.command
diff --git a/src/databricks/labs/ucx/contexts/application.py b/src/databricks/labs/ucx/contexts/application.py
@@ -362,7 +362,7 @@ def tables_migrator(self) -> TablesMigrator:
 
     @cached_property
     def assessment_exporter(self):
-        return AssessmentExporter(self.sql_backend, self.config)
+        return AssessmentExporter(self.workspace_client, self.sql_backend, self.config)
 
     @cached_property
     def acl_migrator(self):
diff --git a/src/databricks/labs/ucx/installer/workflows.py b/src/databricks/labs/ucx/installer/workflows.py
diff --git a/tests/integration/assessment/test_export.py b/tests/integration/assessment/test_export.py
diff --git a/tests/unit/assessment/test_export.py b/tests/unit/assessment/test_export.py