Skip to content

Observability azure and gcp#142

Open
jshiwamV wants to merge 4 commits intomainfrom
observability-azure
Open

Observability azure and gcp#142
jshiwamV wants to merge 4 commits intomainfrom
observability-azure

Conversation

@jshiwamV
Copy link
Collaborator

@jshiwamV jshiwamV commented Jan 16, 2026

@jshiwamV jshiwamV requested a review from bobbyiliev January 16, 2026 16:38
@jshiwamV jshiwamV marked this pull request as ready for review January 16, 2026 16:38
@jshiwamV jshiwamV force-pushed the observability-azure branch 3 times, most recently from a39221a to 796e7d0 Compare January 22, 2026 12:57
Copy link
Collaborator

@bobbyiliev bobbyiliev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I think that we have to rebase? I've also added a few questions.

Comment on lines 77 to 92
# Observability outputs (only when enabled)
output "prometheus_url" {
description = "Internal URL for Prometheus server"
value = var.enable_observability ? module.prometheus[0].prometheus_url : null
}

output "grafana_url" {
description = "Internal URL for Grafana"
value = var.enable_observability ? module.grafana[0].grafana_url : null
}

output "grafana_admin_password" {
description = "`admin` password for Grafana"
value = var.enable_observability ? module.grafana[0].admin_password : null
sensitive = true
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the outputs defined in variables.tf instead of outputs.tf? Should these be moved to a separate outputs.tf file or added to the existing one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was my bad. fixed it.

}


module "prometheus" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the AWS example, the operator module gets helm_values configured to enable Prometheus scrape annotations when observability is enabled. Azure and GCP seem to be missing this. Won't Prometheus fail to scrape metrics without these annotations?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, i remember adding this to all the operators, not sure if I did a bad rebase. Thanks for pointing out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

]
}

module "prometheus" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as Azure - missing the operator helm_values for enabling scrape annotations. Do you think that we need those?

]

# https://learn.microsoft.com/en-us/azure/aks/concepts-storage#storage-classes
storage_class = "managed-csi"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this require enabling the CSI driver addon first or is it available out of the box?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}]
})
storage_class = "standard-rwo" # default storage class in gcp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "standard-rwo" the default, I have some vague memory that it was "standard"? Also, does GKE require any specific configuration to use CSI-based storage classes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is enabled by default, i will make it explicit though. Here you can read more about standard-rwo.https://docs.cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/gce-pd-csi-driver#create_a_storageclass

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


depends_on = [
module.operator,
module.aks,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also depend on the node groups to ensure nodes are ready? AWS has module.base_node_group in its depends_on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in azure base node group is the part of AKS cluster config.

storage_class = local.storage_class
depends_on = [
module.operator,
module.gke,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as Azure - should this depend on node pools being ready? The AWS example has module.nodepool_generic in depends_on but GCP only depends on the cluster.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, actually since it depends on operator, it indirectly is dependent on nodepool_generic, but its good to keep it consistent across cloud providers, thanks for pointing it out.

Comment on lines 334 to 335


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there's an extra blank line before the prometheus module declaration here.

@jshiwamV jshiwamV force-pushed the observability-azure branch from 796e7d0 to 83e23c1 Compare February 1, 2026 09:13
Copy link
Collaborator

@bobbyiliev bobbyiliev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks very good. A few questions and nits below which should not really be blockers.

# Enable Prometheus scrape annotations when observability is enabled
helm_values = var.enable_observability ? {
observability = {
enabled : true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should this use = instead of : for consistency with the rest of the codebase?

Comment on lines +210 to 211
}
variable "dns_service_ip" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing blank line between disk_driver_enabled and dns_service_ip variables?

Comment on lines +69 to +71


variable "enable_observability" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra blank line here, worth keeping consistent?

]
}

provider "registry.terraform.io/hashicorp/http" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hashicorp/http provider was added to both Azure and GCP lock files, but I don't see it explicitly used. Is it a transitive dependency from the prometheus/grafana modules?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants