Skip to content

Conversation

@gergelj
Copy link

@gergelj gergelj commented Oct 27, 2025

Pull Request

Description

Terraform code for a simple Azure Databricks deployment with VNet injection.
There is one directory with code for deploying a workspace to an existing VNet and one that provisions a new VNet.

Category

  • core-platform
  • data-engineering
  • data-governance
  • data-warehousing
  • genai-ml
  • launch-accelerator
  • workspace-setup

Type of Change

  • New project
  • Bug fix
  • Enhancement
  • Documentation

Project Details

Project Name: Azure VNet injection Workspace Setup Guide (Terraform)
Purpose: Customer enablement
Technologies Used: Terraform, Databricks, Azure

Testing

  • Code runs without errors
  • Documentation is complete
  • Used only synthetic data

Security Compliance ✅

  • No customer data, PII, or proprietary information
  • No credentials or access tokens
  • Only synthetic data used
  • Third-party licenses acknowledged

By submitting this PR, I confirm I have followed the CONTRIBUTING.md guidelines and security requirements.

@gergelj
Copy link
Author

gergelj commented Oct 27, 2025

Hi all,
The code runs without errors; however, the workspace access is not configured properly. I can look into this to fix it, and that's why I'm starting this thread to open a discussion.

Here's what's happening. When this code deploys the workspace, the Azure identity is assigned as the workspace admin. This identity is typically the user.name_databricks.com#ext#@dbdevfieldeng... principal, which is different than the [email protected] one that is used for the SSO. Therefore, I'm unable to access the workspace. Did I misconfigure the Azure CLI, or is this expected behavior? Would this be the same behavior when a customer tries to deploy the workspace this way?

I wanted to keep the code as simple as possible, without additional user provisioning/managing. That's why I haven't yet explored possible solutions for this. Let me know what your thoughts are on this.

(cc @haleyyyblue)

@gergelj gergelj requested a review from haleyyyblue October 29, 2025 10:21
Copy link
Collaborator

@haleyyyblue haleyyyblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @gergelj ! Thanks for putting this together. The code is clean and well-organized. Here's my feedback:
🎉 What's working well
Great structure: The flat, purpose-specific file organization (azure.tf, databricks.tf, network.tf) aligns perfectly with our philosophy of keeping things simple and self-contained
Clear documentation: The README provides good step-by-step guidance for users
Consistent approach: Both scenarios follow the same patterns, making it easy to understand
🔧 Key suggestions

  1. NAT Gateway (Critical)
    Azure will remove default outbound connectivity next year. We should explicitly add NAT Gateway resources now to future-proof this. Both public and private subnets should be configured with no_public_ip = true and explicit NAT Gateway associations.
  2. Consider merging the two projects (High priority)
    Looking at our aws-byovpc example, it handles both "new VPC" and "existing VPC" scenarios in a single project using conditional logic (vpc_id == "" ? create : use_existing). This approach:
    Reduces duplication
    Gives users one place to start
    Makes maintenance easier
    Would you be open to combining azure-vnet-injection-existing-vnet and azure-vnet-injection-new-vnet into a single azure-vnet-injection project with a conditional variable?
  3. Remove CMK commented code
    The commented-out CMK variables and configuration should be removed entirely from this scenario. CMK would be better as a separate scenario (e.g., azure-vnet-injection-cmk) following our scenario-based philosophy. This keeps each example focused and easier to understand.
  4. Add versions.tf file
    The AWS example includes explicit provider version constraints for reproducibility. Let's add a versions.tf to the Azure projects as well.
  5. Additional improvements for consistency
    To match the completeness of the AWS example:
    More outputs: Add network details (subnet IDs, NSG ID, NAT Gateway ID) and resource group info
    Variable validation: Add storage account name format, and CIDR ranges
    Consider Unity Catalog: The AWS example includes metastore setup - might be worth adding here too
    📝 Action items
    Must have (blocking):
    [ ] Add NAT Gateway resources with explicit outbound configuration
    [ ] Add versions.tf with provider version constraints
    [ ] Remove all CMK-related commented code (variables, configuration)
    Should have (strongly recommended):
    [ ] Merge the two projects into one with conditional VNet creation
    [ ] Expand outputs to include network and resource details
    [ ] Add variable validation
    Nice to have:
    [ ] Unity Catalog metastore support
    [ ] Enhanced documentation with troubleshooting section
    Let me know if you'd like to discuss any of these suggestions! Happy to pair on the NAT Gateway implementation or the project consolidation if that would help. 😊

@gergelj
Copy link
Author

gergelj commented Nov 4, 2025

Completed the following action items so far:

Must have (blocking):

  • Add NAT Gateway resources with explicit outbound configuration
  • Add versions.tf with provider version constraints
  • Remove all CMK-related commented code (variables, configuration)

Should have (strongly recommended):

  • Merge the two projects into one with conditional VNet creation
  • Expand outputs to include network and resource details
  • Add variable validation

Nice to have:

  • Unity Catalog metastore support
  • Enhanced documentation with troubleshooting section

These changes are applied to the azure-vnet-injection-new-vnet folder. Deployed a workspace with a NAT gateway, and I tested a cluster, which seems to be working fine. Please let me know if I've missed anything in the NAT gateway configuration.

If the NAT gateway script is alright, I would like to consolidate the azure-vnet-injection-new-vnet and the azure-vnet-injection-existing-vnet directories into one.

(cc. @haleyyyblue)

P.S. I would like to know the correct Azure CLI authentication method. I still get the gergelj.kis_databricks.com#ext#@dbdevfieldeng.onmicrosoft.com user assigned as workspace admin upon deploying the workspace.

@haleyyyblue
Copy link
Collaborator

@gergelj
Great work — all the improvements you’ve made look excellent!
Really appreciate that you not only added the NAT Gateway configuration but also deployed and tested the cluster 👏

One small suggestion:
Instead of defining the NAT Gateway resources directly, how about trying to use the nat_gateway_name attribute in the azurerm_databricks_workspace resource?
Totally open to discussion — feel free to share your thoughts!

Regarding the Azure CLI authentication issue, it might be helpful to ask in the #field-infra channel, as they can probably guide you more quickly and accurately.

Also, consolidating the two directories sounds good to me once we finalize the NAT configuration.

Thanks again for the great updates — really appreciate the effort! 🚀
Let’s continue polishing this together 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants