LHP generate to databrick bundle - shim code feedback

>Forthmost, thank you for the great project

During our use of `lhp` we had to develop some shim scripts to faciliate our use of LHP -> Databricks Bundle -> Deployment. This project has been a great tool for us to maintain version control of our pipeline and perfect for our deployment otherwise. Here are some of the real use issues we're encountered. 

These are the specific issues found in the scripts and the recommended fixes for the upstream LakehousePlumber project.

#### 1. Python String Interpolation (Syntax Error)
*   **The Issue:** LHP generates Python code with bare placeholder strings like `database: "{catalog}.{schema}"` inside decorators. This is not valid Python variable interpolation.
*   **The Shim Fix:** `fix_generated_pipelines.py` uses regex to wrap these strings in `f"..."`.
*   **Upstream Fix:**
    *   **Update Template:** The LHP Jinja2 templates for Python generation must produce f-strings (`f"{...}"`) instead of bare strings.
    *   **Variable Scope:** Ensure the generator explicitly outputs the configuration block (e.g., `catalog = spark.conf.get(...)`) at the top of the file so these f-strings have variables to reference.

#### 2. Decorator Ordering (Runtime Error)
*   **The Issue:** LHP generates code where expectation decorators (e.g., `@dlt.expect`) appear *before* the table decorator (e.g., `@dlt.table`). DLT requires the table/view decorator to be the **outermost** (top) decorator to register the dataset correctly.
*   **The Shim Fix:** Regex logic swaps the order of these decorators.
*   **Upstream Fix:**
    *   **Generator Logic:** Adjust the generation loop to ensures the `@dlt.table` / `@dlt.view` decorator is always yielded first (at the top), followed by any expectations or access control decorators.

#### 3. Import & API Standardization
*   **The Issue:** LHP generates `from pyspark import pipelines as dp` and uses `@dp.temporary_view`.
    *   The `pipelines` module is often deprecated or not available in standard DLT runtimes (which prefer `import dlt`).
    *   `temporary_view` is not a standard DLT decorator (it's `@dlt.view` or just a function without a decorator for pure temporary scope).
*   **The Shim Fix:** Replaces imports with `import dlt as dp` and maps `temporary_view` to `view`.
*   **Upstream Fix:**
    *   **Modernize API:** Update the generator to use the standard `import dlt` library.
    *   **Deprecate Custom Types:** Remove LHP-specific types like `temporary_view` in favor of standard DLT view definitions.

#### 4. Data Quality Visibility (`temporary=True`)
*   **The Issue:** LHP defaults validation tables to `temporary=True`. In DLT, temporary tables do not record Expectation metrics to the event log, making Data Quality invisible in the DLT UI.
*   **The Shim Fix:** Regex removes `temporary=True` from generated table definitions.
*   **Upstream Fix:**
    *   **Configurable Default:** Change the default for validation actions to `temporary=False` or expose this as a configurable YAML property (`visible: true/false`).

#### 5. Spark Configuration Injection
*   **The Issue:** The pipelines fail with `AnalysisException` without `pipelines.incompatibleViewCheck.enabled="false"`. LHP provides no native way to inject top-level `spark.conf.set` calls into the generated Python file.
*   **The Shim Fix:** Manually injects this line at the top of every file.
*   **Upstream Fix:**
    *   **Schema Extension:** Add a `spark_configuration:` section to the LHP YAML schema that generates corresponding `spark.conf.set()` calls in the Python output.

#### 6. DLT Dependency Resolution
*   **The Issue:** When using `spark.sql(...)` inside a DLT function, the lineage graph often fails to detect dependencies.
*   **The Shim Fix:** Scans the SQL string for known table names and injects dummy `dlt.read("...")` calls to "hint" the dependency to DLT.
*   **Upstream Fix:**
    *   **Explicit Reads:** Instead of generating raw `spark.sql(...)`, LHP should generate `dlt.read("upstream_table")` calls or explicit `spark.readStream.table(...)` references which DLT can track natively.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LHP generate to databrick bundle - shim code feedback #75

1. Python String Interpolation (Syntax Error)

2. Decorator Ordering (Runtime Error)

3. Import & API Standardization

4. Data Quality Visibility (`temporary=True`)

5. Spark Configuration Injection

6. DLT Dependency Resolution

Sub-issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

LHP generate to databrick bundle - shim code feedback #75

Description

1. Python String Interpolation (Syntax Error)

2. Decorator Ordering (Runtime Error)

3. Import & API Standardization

4. Data Quality Visibility (temporary=True)

5. Spark Configuration Injection

6. DLT Dependency Resolution

Sub-issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

4. Data Quality Visibility (`temporary=True`)