Skip to content

[BUG] - Azure deployment fails after upgrade on 2025.2.1rc2Β #2964

@viniciusdc

Description

@viniciusdc

Describe the bug

Due to recent changes in the Azure provider version (#2812), a few inner attributes from the cluster networking configuration were deprecated and removed, while some were entirely replaced.

This does not affect any nebari deployment per sisince the actual apply command ccalls the tofu init method under the hood. However, before deploying to avoid misuse of specific attributes in our config, we ran check_immutable_file.d

def check_immutable_fields(self):
nebari_config_state = self.get_nebari_config_state()
if not nebari_config_state:
return

def get_nebari_config_state(self) -> dict:
directory = str(self.output_directory / self.stage_prefix)
tf_state = opentofu.show(directory)
nebari_config_state = None

which depends on tofu show --json to load the state data info that is used later in the checks. The main problem comes when there are provider version schema changes, which is the case for this release. In this situation, the open tofu docs suggests its users run tofu refresh to update the provider versions beforehand, as seen below:

If you've updated providers that contain new schema versions since the state was written, the state needs to be upgraded before it can be displayed with show -JSON. If you are viewing a plan, it must be created without -refresh=false. If you are viewing a state file, run tofu refresh first.

Based on a quick look at the code, we have two options:

  • Include a parsing/override logic into the upgrade command to fix this for this release by manually updating the affected fields in the state files ourselves;
  • Update the inner logic around the check_immutable_fields to properly refresh its state before attempting to run tofu show;

To me, addressing the root cause of the issue will not be the best in this case, but it would also address the issue in eventual provider updates without requiring us to maintain these patches in the upgrade command. However, a caveat is that since tofu show didn't depend on any input variable for it to run, that was not implemented considering this, while tofu refresh requires the values of those inputs to be passed down or else the results in an error with missing vars.

Expected behavior

Correct run of nebari's deployment

OS and architecture in which you are running Nebari

Linux

How to Reproduce the problem?

deploy an Azure deployment using the latest release (2024.12.1) and then, after running nebari upgrade with the latest RC, run nebari deploy

Command output

[tofu]: 
[tofu]: Initializing the backend...
[tofu]: Upgrading modules...
[tofu]: - terraform-state in modules/terraform-state
[tofu]: 
[tofu]: Initializing provider plugins...
[tofu]: - terraform.io/builtin/terraform is built in to OpenTofu
[tofu]: - Finding hashicorp/azurerm versions matching "4.7.0"...
[tofu]: - Installing hashicorp/azurerm v4.7.0...
[tofu]: - Installed hashicorp/azurerm v4.7.0 (signed, key ID 0C0AF313E5FD9F80)
[tofu]: 
[tofu]: Providers are signed by their developers.
[tofu]: If you'd like to know more about provider signing, you can read about it here:
[tofu]: https://opentofu.org/docs/cli/plugins/signing/
[tofu]: 
[tofu]: OpenTofu has made some changes to the provider dependency selections recorded
[tofu]: in the .terraform.lock.hcl file. Review those changes and commit them to your
[tofu]: version control system if they represent changes you intended to make.
[tofu]: 
[tofu]: OpenTofu has been successfully initialized!
[tofu]: 
[tofu]: You may now begin working with OpenTofu. Try running "tofu plan" to see
[tofu]: any changes that are required for your infrastructure. All OpenTofu commands
[tofu]: should now work.
[tofu]: 
[tofu]: If you ever set or change modules or backend configuration for OpenTofu,
[tofu]: rerun this command to reinitialize your working directory. If you forget, other
[tofu]: commands will detect it and remind you to do so if necessary.
[tofu]: Failed to marshal state to json: unsupported attribute "enable_https_traffic_only"

Versions and dependencies used.

No response

Compute environment

Azure

Integrations

No response

Anything else?

No response

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Done πŸ’ͺ🏾

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions