Skip to content

Conversation

Copy link

Copilot AI commented Oct 16, 2025

Azure OpenAI Soft-Delete Purge Fix

This PR addresses issue #309 by implementing a solution to purge soft-deleted Azure OpenAI resources after azd down in CI/CD workflows.

✅ Completed Plan:

  • Create feature branch ianjensenisme/309-AzureOpenAI-Purge
  • Add Terraform outputs for OpenAI resource name
  • Add purge step to azure-dev.yml workflow after azd down
  • Add purge step to azure-dev-down.yml workflow after azd down
  • Validate Terraform configuration
  • Create comprehensive decision log documentation
  • Add troubleshooting documentation
  • Address PR feedback

🎯 Problem Solved

Azure OpenAI resources remain in a "soft-deleted" state for 48 hours after deletion, preventing name reuse during CI/CD testing. This caused subsequent PR deployments to fail when trying to create resources with the same names.

🔧 Solution Approach

Since Azure Developer CLI does not support postdown/predestroy hooks, the solution adds Azure CLI commands directly to GitHub Actions workflows to purge soft-deleted Cognitive Services (OpenAI) resources after running azd down.

📝 Changes Made

1. Terraform Output (infra/outputs.tf)

output "openai_resource_name" {
  description = "The name of the Azure OpenAI resource"
  value       = module.azure_open_ai.resource.name
}

2. GitHub Actions Workflows

Added "Purge Soft-Deleted Azure OpenAI Resources" step in:

  • .github/workflows/azure-dev.yml (after "Destroy Infrastructure")
  • .github/workflows/azure-dev-down.yml (after "Azd down")

The purge step:

  • Retrieves resource info from azd env get-values output
  • Executes: az cognitiveservices account purge --location <region> --resource-group <rg> --name <name>
  • Gracefully handles missing resources or non-soft-deleted state
  • Only runs during infrastructure destruction (PR builds or manual trigger)

3. Decision Log (decision-log/006-azure-openai-soft-delete-purge.md)

Comprehensive documentation covering:

  • Context and problem statement
  • Implementation details
  • Rationale and alternatives considered
  • Consequences and trade-offs

4. Troubleshooting Guide (docs/troubleshooting.md)

Added troubleshooting section explaining:

  • Symptoms of the soft-delete issue (ResourceNameAlreadyExists error)
  • Root cause (48-hour soft-delete retention period)
  • Solution for CI/CD (automated purge) and local development (manual purge)
  • Link to README for manual purge instructions

✅ Validation & Testing

  • ✅ Terraform configuration validated successfully
  • ✅ YAML workflow syntax validated
  • ✅ Purge logic tested with edge cases (missing variables, etc.)
  • ✅ Changes follow minimal modification principle
  • ✅ All tests passed

📊 Files Modified

  1. infra/outputs.tf (+5 lines)
  2. .github/workflows/azure-dev.yml (+24 lines)
  3. .github/workflows/azure-dev-down.yml (+22 lines)
  4. decision-log/006-azure-openai-soft-delete-purge.md (+105 lines)
  5. docs/troubleshooting.md (+15 lines)

Total: 5 files, 171 lines added

🎉 Impact

  • ✅ CI/CD deployments can now run repeatedly without manual intervention
  • ✅ No accumulation of soft-deleted resources in subscription
  • ✅ Clear audit trail of purge operations in workflow logs
  • ✅ Automated cleanup aligns with automated provisioning philosophy
  • ✅ Solves PR deployment testing workflow failures
  • ✅ Comprehensive troubleshooting documentation helps users resolve soft-delete issues

Fixes #309

Original prompt

This section details on the original issue you should resolve

<issue_title>[Bug] Azure OpenAI Resources Soft-Deleted</issue_title>
<issue_description>### Is there an existing issue?

  • I have searched the existing issues

Description

Currently, by default, Azure OpenAI resources remain in a "soft-deleted" state after azd down. These resources are not purged from the "recently deleted" list, and the underlying AZD module that manages them currently does not support specifying their hard deletion during a terraform destroy event.

This causes issues when PRs are deployment-tested, as the resource names do (should) not change between deployments, so the second+ deployment will fail due to soft-deleted AOAI resources remaining in the purge list.

Steps to Reproduce

1. Deploy the default architecture using the CI-deploy workflow, which should automatically destroy the architecture at the end of the checks.
2. Redeploy the default architecture using the same mechanism.
4. Observe errors related to AOAI resources by the same name already existing in the "soft delete" purge queue.

Expected Behavior

CI deployment tests should work regardless of how many iterations have been executed. Resource desctruction should clear any "soft deletes", specifically the soft delete of the AOAI resource.

Anything else?

Resolving this issue will likely require following the resource destruction steps in the GH workflow with an az resource delete command against the offending AOAI resource. Being able to do this accurately may also require adding a terraform output identifying the soft-deleted resource so the name can be used in the resource delete.</issue_description>

<agent_instructions>Name the feature branch ianjensenisme/309-AzureOpenAI-Purge</agent_instructions>

Comments on the Issue (you are @copilot in this section)

Fixes #309


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@ianjensenisme
Copy link
Contributor

@copilot the new terraform output is currently described with "The name of the Azure OpenAI resource (for purging soft-deleted resources)", update to just say "The name of the Azure OpenAI resource".

Copilot AI changed the title [WIP] Fix handling of soft-deleted Azure OpenAI resources [Bug Fix] Purge soft-deleted Azure OpenAI resources after azd down in CI/CD Oct 16, 2025
Copilot AI requested a review from ianjensenisme October 16, 2025 04:21
Copilot finished work on behalf of ianjensenisme October 16, 2025 04:21
Copy link
Author

Copilot AI commented Oct 16, 2025

@copilot the new terraform output is currently described with "The name of the Azure OpenAI resource (for purging soft-deleted resources)", update to just say "The name of the Azure OpenAI resource".

Updated the output description to "The name of the Azure OpenAI resource" in commit f862ef9.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • checkpoint-api.hashicorp.com
    • Triggering command: terraform init -backend=false (dns block)
    • Triggering command: terraform validate (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot finished work on behalf of ianjensenisme October 16, 2025 04:24
@ianjensenisme
Copy link
Contributor

@copilot add an explainer to troubleshooting.md that explains the symptoms of the purge issue and links to the relevant note in the README on how to manually purge.

Copy link
Author

Copilot AI commented Oct 16, 2025

@copilot add an explainer to troubleshooting.md that explains the symptoms of the purge issue and links to the relevant note in the README on how to manually purge.

Added troubleshooting section to docs/troubleshooting.md explaining the symptoms, cause, and solution with a link to the README's Clean Up section for manual purge instructions in commit 8dfa1ce.

Copilot finished work on behalf of ianjensenisme October 16, 2025 05:03
@ianjensenisme ianjensenisme marked this pull request as ready for review October 16, 2025 05:14
@ianjensenisme ianjensenisme requested a review from a team as a code owner October 16, 2025 05:14
Copilot AI review requested due to automatic review settings October 16, 2025 05:14
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses Azure OpenAI resource name conflicts in CI/CD by implementing automated purging of soft-deleted resources after azd down operations. The solution ensures that repeated deployments with consistent resource naming work reliably without manual intervention.

Key changes implemented:

  • Added Terraform output for OpenAI resource name to enable purge commands
  • Integrated automated purge steps in GitHub Actions workflows after infrastructure destruction
  • Created comprehensive documentation including decision log and troubleshooting guidance

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
infra/outputs.tf Added openai_resource_name output to expose resource name for purge operations
.github/workflows/azure-dev.yml Added purge step after "Destroy Infrastructure" with proper error handling
.github/workflows/azure-dev-down.yml Added purge step after "Azd down" with graceful failure handling
decision-log/006-azure-openai-soft-delete-purge.md Comprehensive ADR documenting context, implementation, and rationale
docs/troubleshooting.md Added troubleshooting section for soft-delete resource name conflicts
README.md Added note about Azure OpenAI soft-delete behavior and manual purge instructions

@ianjensenisme ianjensenisme force-pushed the copilot/fix-soft-deleted-resources branch from 9244e68 to 6b44a6f Compare October 16, 2025 05:52
@ianjensenisme ianjensenisme force-pushed the copilot/fix-soft-deleted-resources branch from 6b44a6f to f714f43 Compare October 16, 2025 05:55
@mawasile mawasile merged commit 99aa222 into main Oct 16, 2025
12 checks passed
@mawasile mawasile deleted the copilot/fix-soft-deleted-resources branch October 16, 2025 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Azure OpenAI Resources Soft-Deleted

4 participants