Skip to content

Migrate to data_interfaces v1 library#27

Open
mvlassis wants to merge 20 commits intomainfrom
pra-166-migrate-data-interfaces-v1
Open

Migrate to data_interfaces v1 library#27
mvlassis wants to merge 20 commits intomainfrom
pra-166-migrate-data-interfaces-v1

Conversation

@mvlassis
Copy link
Contributor

@mvlassis mvlassis commented Feb 11, 2026

This PR updates the azure_service_principal.py library to use the new v1 version of data_interfaces. The code for the library can be found in https://github.com/canonical/data-platform-libs/blob/main/lib/charms/data_platform_libs/v1/data_interfaces.py

The public interface for the requirer charm should stay the same.

Notes

  • Remove the unneeded decode_secret_with_retry function in src/events/base.py that caused intermittent errors in integration tests due to the events taking too long to execute.
  • Update AzureServicePrincipalInfo dataclass to always return strings instead of None fields. This makes it easier for the azure_service_principal.py library to handle missing credentials.
  • Rely on the resouce_requested events that data_intefaces.py to create the initial resource request, and continuously update the response based on any changed data.
  • Emit the same ServicePrincipalInfoChangedEvent and ServicePrincipalInfoGoneEvent that we already use
  • Update the version of the Rust toolchain to fix .
  • Remove empty event handler overrides in azure_service_principal.py
  • Remove test_relation_application_data, since that part should now be handled by the charm library.
  • Add useful logs in integration tests.

@mvlassis mvlassis marked this pull request as draft February 11, 2026 10:15
@mvlassis mvlassis marked this pull request as ready for review February 13, 2026 14:04
Copy link
Contributor

@deusebio deusebio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass. I see that there is an integration test which is always failing (I didn't see any green CI in the PR). Can we look into this?

Also: has the interface protocol changed? could we just verify that the new refactoring works also when the requirer charm is still on the old library? Have we checked this compatiblity?

Copy link
Member

@theoctober19th theoctober19th left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an initial round of review.

I have left some comments and asked for a little more clarity at some places.

I do agree with the point Enrico has raised. Can we maybe check if the consumers that are using this lib will be able to continue to do so without changing anything in their part? (maybe this is not possible -- but the confirmation of that would help because we can then be better informed that we are not supporting existing consumers).

from utils.secrets import decode_secret_key


@retry(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: This logic was added here on purpose because of the race conditions the charm would face when deployed via Juju Terraform Provider (charm deployment vs secret grant event). Unless we have made sure that this problem is not relevant anymore and has been already fixed elsewhere, this block cannot be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't encountered that, do we have an issue for that race condition bug to at least link it in the code? How can we reproduce the issue?

The reason I removed this part was that it caused issues during integration testing, since multiple operations on secrets (as we do in

# Assert that configuring a secret that doesn't exist produces an error message
# Create a secret and immediately remove it
secret_uri = juju.add_secret("fake-secret", {"test-key": "test-value"})
juju.remove_secret(secret_uri)
juju.config(APP_NAME, {"credentials": secret_uri})
juju.wait(jubilant.all_agents_idle, delay=5.0)
status = juju.wait(
lambda status: jubilant.all_blocked(status, APP_NAME),
)
assert status.apps[APP_NAME].app_status.message == f"The secret '{secret_uri}' does not exist."
# Add a secret but don't grant permission, so status should stay blocked
secret_uri = juju.add_secret(SECRET_IDENTIFIER, {"client-id": CLIENT_ID_TEST_VALUE})
juju.wait(jubilant.all_agents_idle)
juju.config(APP_NAME, {"credentials": secret_uri})
juju.wait(jubilant.all_agents_idle)
status = juju.wait(lambda status: jubilant.all_blocked(status, APP_NAME))
assert (
status.apps[APP_NAME].app_status.message
== f"Permission for secret '{secret_uri}' has not been granted."
)
juju.remove_secret(secret_uri)
# Add a secret but don't provide all values for the secret, so status should stay blocked
secret_uri = juju.add_secret(SECRET_IDENTIFIER, {"client-id": CLIENT_ID_TEST_VALUE})
juju.grant_secret(secret_uri, APP_NAME)
juju.config(APP_NAME, {"credentials": secret_uri})
juju.wait(jubilant.all_agents_idle, delay=10.0)
status = juju.wait(lambda status: jubilant.all_blocked(status, APP_NAME))
assert (
status.apps[APP_NAME].app_status.message
== f"The key 'client-secret' was not found in secret '{secret_uri}'."
)
) would cause issues as the decode_secret would make the event handlers too long to finish. We can potentially go around this by increasing the delay interval to much higher values, but I would prefer to do that only if we know that this is needed

"""Data abstraction of the requirer side of Azure service principal relation."""

SECRET_FIELDS = ["client-id", "client-secret"]
subscription_id: str = Field(default="")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts: I don't think these attributes belong to RequirerModel. Is there a case where the requirer would request for client_id, client_secret, tenant_id, etc.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Updated in e211b9c


return {
key: getattr(request, key.replace("-", "_"), None)
for key in AZURE_SERVICE_PRINCIPAL_REQUIRED_INFO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts: I think this utility function should return all fields from the databag, and not just the ones in AZURE_SERVICE_PRINCIPAL_REQUIRED_INFO because there might be optional fields shared by the provider as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here comes from the fact that the databag has values that the consumer is not actually interested in. Returning all values would result in something:

endpoints: None
read-only-endpoints: None
request-id: cefee465f2ba9da4
resource: azure-service-principal
salt: IHsZ34tL5WnnEJoK
secret-entity: None
secret-extra: secret://3d4a06e3-44da-4a74-8488-efbbe7f638f2/d6arecvmp25c7a1f61bg
secret-tls: None
secret-user: None
subscription-id: "6"
tenant-id: "2"
version: None

Of course, the consumer here would not be interested in the request-id, the None values (that exist in the data-interfaces parent class but are not used here), and the salt. We could:

  1. Get all values from the response, rrogrammatically remove those unwanted values, so as to keep any extra secrets
  2. Keep it as it is, and add any optional values in the library as needed (perhaps as a AZURE_SERVICE_PRINCIPAL_OPTIONAL_INFO list)


def _on_relation_changed_event(self, event: RelationChangedEvent) -> None:
"""Notify the charm about the presence of Azure service principal credentials."""
super()._on_relation_changed_event(event)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: What is the behavior of super()._on_relation_changed here? If we override this method with the section of the code below, do we still need to make a call to super()'s implementation of this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would be overriding this method here, since AzureServicePrincipalRequirer inherits from ResourceRequirerEventHandler. This method subsequently calls _handle_event), which the part of the code that creates the initial Resource Request, as part of the data contract between the provider and the requirer.

We subsequently use this resource request in the provider, both in set_response and update_response to continuously update the response to the request. Overriding this function would skip the resource request process, and skip the main usage of data_interfaces v1 in our charm.


self.framework.observe(
self.charm.on[self.relation_name].relation_joined, self._on_relation_joined_event
def set_initial_response(self, event: ResourceRequestedEvent, data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why is this method necessary here, and how is it different from update_response down below? Is it possible perhaps to use the same logic / function for all use cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently we don't need to use set_reponse() from data_interfaces.py to properly populate the data bag, so I reformatted to keep only one function in e15a358

azure_credentials = get_application_data(juju, TEST_APP_NAME, RELATION_NAME)
app_data = get_application_data(juju, APP_NAME, RELATION_NAME)["data"]
azure_credentials = get_credentials(app_data)
logger.debug("**********AZURE CREDENTIALS**********")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: I guess we need to remove these logs before we merge, since this may expose the keys we use in CI as CI logs are public.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The credentials used here are dummy ones though and defined inside the test themselves, so it isn't truly a security leak. They should always be the values defined here.

I found this logging statement useful when debugging, but I guess it isn't really useful when testing since we already assert the expected values, so I removed the statement in 5a6d501

assert state_out.unit_status == ActiveStatus()


def test_relation_application_data(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why is this test removed? I believe this test still makes much sense, because we are checking whether the provider shared the necessary data or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readded the test in 3fb170c, slightly modified since now the secret id is transmitted to the application data bag

Copy link
Contributor

@deusebio deusebio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mvlassis Have you addressed the comment in my review above?

Also: has the interface protocol changed? could we just verify that the new refactoring works also when the requirer charm is still on the old library? Have we checked this compatiblity?

I believe this was also what @theoctober19th mentioned in his review

provider: microk8s
channel: 1.32-strict/stable
juju-channel: 3.6/stable
- name: Setup environment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise Good that we move to concierge and Canonical K8s!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments