Skip to content

Azure Managed Mongo Compatibility: Findings #1223

@OlliMartin

Description

@OlliMartin

Hellou again.

As discussed in #1095, we wanted to share our findings running Mender against a managed Azure MongoDB (DocumentDB) instance.
I finally found some time to test the fix in DeviceAuth - and I'm happy to share:

It works!! (somewhat).

DocumentDB Setup

Setting Value
Cluster tier M10 tier, 1 vCores, 2 GiB RAM
Shard Count 1
High Availability No
MongoDB Version 8.0

Problem

We were able to successfully* run Mender (with the new device_auth image 4.0.X) without modifications to any of the images.
However, as expected, we needed to adjust the two (sparse) indexes in device auth:

mongosh "mongodb+srv://$user@$mongoInstance.global.mongocluster.cosmos.azure.com/?tls=true&authMechanism=SCRAM-SHA-256&retrywrites=false&maxIdleTimeMS=120000"

# Provide Password when prompted

showDbs
use deviceauth

db.getCollectionNames()
db.auth_sets.getIndexes()

By default, mender will create 3 indexes on auth_sets, specifically the last command returns:

[
  { v: 2, key: { _id: 1 }, name: '_id_' },
  {
    v: 2,
    key: { tenant_id: 1, id_data_sha256: 1, pubkey: 1 },
    name: 'tenant_id_id_data_sha256_pubkey',
    unique: true,
    storageEngine: { enableOrderedIndex: true }
  },
  {
    v: 2,
    key: { tenant_id: 1, device_id: 1 },
    name: 'tenant_id_device_id',
    storageEngine: { enableOrderedIndex: true }
  }
]

Since we are running the community version of mender, we do not have any tenant_id, which will make the device_auth crash.
I think (!) tenant_id will always be empty string and this does not work without having sparse indices available.

I tried to find a workaround by setting various environment variables (which are quite likely not intended to be used like this ^^),
but a mixture of trial+error and looking into the respective code places makes me believe that tenant_id will always be empty string in the community helm chart, no matter what I do. (To be clear: I am not trying to cheat myself to use non-community functionality, I tried to make it work without patching the indices).

Patching the Indices

db.auth_sets.dropIndexes("*")

db.auth_sets.createIndex({ id_data_sha256: 1, pubkey: 1 }, { unique: true, name: 'tenant_id_id_data_sha256_pubkey' })
db.auth_sets.createIndex({ device_id: 1 }, { unique: true, name: 'tenant_id_device_id' })

As indicated above, we recreated the indices with a matching name, just without including the tenant_id field.
I was not sure if there is any reference (by name) on the index in the code, so this seemed like the smart thing to do.

Test Suite

While logging in worked even before the device_auth patch, our test-suite would fatally crash because DocumentDB returns a different error message than native mongo.. But now:

Image

All green! 💥

To summarize/give an overview what the tests are checking on mender side I dumped the requests and created a pivot table (would be happy to share it privately, in case more information is required).
On the left are the callouts against mender by our API/Production code, on the right are the test setup callouts. Our test-setup mostly involved setting up (simulated) devices and uploading dummy-artifacts to deploy later:

Image

@alfrunes In #1095 you mentioned that Mender utilizes sparse indexes - tbh, I would have expected more problems.
However, needless to say: We are not utilizing all of menders functionality, so I have no clue what was potentially missed.

To move forward:

  • Are there things you want to have specifically tested?
  • Is there any information missing that we should share?
  • Would it be possible to remove the tenant_id from the index in the community version?
    I am not sure if there is the plan to remove this altogether with the recent removal of enterprise related code from the public repo.

To summarize: The reference to tenant_id in the two auth_sets seems to be the only problem in our use case and we would be happy to support finding/implementing a solution except deleting and recreating the index.

Thank you so much!!

Cheers & Happy Christmas
Olli

Edit: Pinged the wrong dev.. I'm so sorry!

Edit2: For completeness here are the descriptions of our test cases (I removed some uninteresting ones), but hopefully the callout list is sufficient

Test Case Descriptors
Test (108 tests) Success
  RestApi (108 tests) Success
    RestApi (108 tests) Success
      ApiSurface (17 tests) Success
        ApiSurfaceTests (17 tests) Success
          DeploymentTests (2 tests) Success
            ApiSurfaceTests.SanityCheck_VerifyConfiguredCorrectly Success
            Retrieve Finished Deployment Success
          DeviceTests (5 tests) Success
            ApiSurfaceTests.SanityCheck_VerifyConfiguredCorrectly Success
            Deploy Device with Artifact Type=App Success
            Get Device by Resource Attributes (Multiple Devices) Success
            Get Device in Provisioning State=Accepted Success
            Get Software of Device with Artifact Type=App Success
          SanityCheck_VerifyConfiguredCorrectly Success
          SoftwareTests (9 tests) Success
            ApiSurfaceTests.SanityCheck_VerifyConfiguredCorrectly Success
            Create Software Version with Type=App Success
            Create Software Version with Type=Single-File Success
            Get Channels by Software Name MultipleChannels_MultiVersion Success
            Get Channels by Software Name MultipleChannels_SingleVersion Success
            Get Channels by Software Name SingleChannel Success
            Get Devices By Software with Type=App Success
            Get Software by Identifier with Type=App Success
            Verify Artifact with Type=App Success
      Framework.Tests (1 test) Success
        SimulatedClientTests (1 test) Success
          SimulatedClient_AfterAccept_ShouldPopulateInventory Success
      NonFunctional (5 tests) Success
        ResourceSynchronizationTests (5 tests) Success
          ShouldSynchronizeAccessToDeviceConfiguration_TableTests (4 tests) Success
            ShouldSynchronizeAccessToDeviceConfiguration_TableTests(applicationCount: 1) Success
            ShouldSynchronizeAccessToDeviceConfiguration_TableTests(applicationCount: 10) Success
            ShouldSynchronizeAccessToDeviceConfiguration_TableTests(applicationCount: 2) Success
            ShouldSynchronizeAccessToDeviceConfiguration_TableTests(applicationCount: 5) Success
          ShouldThinkOfABetterNameLater Success
      ArtifactControllerTests (12 tests) Success
        Negative (10 tests) Success
          DockerApp (10 tests) Success
            ShouldReturn400ForInvalidSoftwareChannel_TableTests (5 tests) Success
              ShouldReturn400ForInvalidSoftwareChannel_TableTests(softwareChannel: "123456789012345678901234567890123") Success
              ShouldReturn400ForInvalidSoftwareChannel_TableTests(softwareChannel: "software_channel") Success
              ShouldReturn400ForInvalidSoftwareChannel_TableTests(softwareChannel: "software-channel") Success
              ShouldReturn400ForInvalidSoftwareChannel_TableTests(softwareChannel: "software#channel") Success
              ShouldReturn400ForInvalidSoftwareChannel_TableTests(softwareChannel: "software$channel") Success
            ShouldReturn400ForInvalidSoftwareName_TableTests (5 tests) Success
              ShouldReturn400ForInvalidSoftwareName_TableTests(softwareName: "12345678901234567890123456789012345678901234567890"···) Success
              ShouldReturn400ForInvalidSoftwareName_TableTests(softwareName: "software;name") Success
              ShouldReturn400ForInvalidSoftwareName_TableTests(softwareName: "software:name") Success
              ShouldReturn400ForInvalidSoftwareName_TableTests(softwareName: "software#name") Success
              ShouldReturn400ForInvalidSoftwareName_TableTests(softwareName: "software$name") Success
        Positive (2 tests) Success
          CreateArtifact_Handles (2 tests) Success
            CreateArtifact_Handles(type: Docker) Success
            CreateArtifact_Handles(type: SingleFile) Success
      AuthControllerTests (3 tests) Success
        LoginAsync_InvalidPassword_ShouldReturnUnauthorized Success
        LoginAsync_UnknownUser_ShouldReturnUnauthorized Success
        LoginAsync_ValidUser_ShouldSucceed Success
      DeploymentControllerTests (5 tests) Success
        Negative (2 tests) Success
          GetDeployment_IdIsGuidEmpty_ShouldReturn400 Success
          GetDeployment_IdIsUnparseable_ShouldReturn400 Success
        Positive (3 tests) Success
          GetDeployment_FinishedDeployment_ShouldHaveTypeSoftware Success
          GetDeployment_FinishedDeployment_ShouldReturn200 Success
          GetDeployment_PendingDeployment_ShouldReturn200 Success
      DeviceAuthControllerTests (4 tests) Success
        ListDevicesWithQueryPending_ShouldShowPendingDevice_WhenDeviceExists Success
        SetProvision_ShouldFail_WhenDeviceNotFound Success
        SetProvision_ShouldProvisionDevice_WhenDeviceExists Success
        SetProvision_ShouldReturn400_WhenCustomerMissing Success
      DeviceControllerTests (14 tests) Success
        Configuration (4 tests) Success
          GetConfigurationStatusByIdentifier_ShouldReturnStatusEnum_WhenSoftwareConfirmed Success
          GetConfigurationStatusByIdentifier_ShouldReturnStatusEnum_WhenSoftwareNotConfirmedYet Success
          UpdateSoftwareConfigurationByDescriptor_ShouldReturnNoContent_WhenUpdateSuccessful Success
          UpdateSoftwareConfigurationByIdentifier_ShouldReturnNoContent_WhenUpdateSuccessful Success
        Deployment (1 test) Success
          DeploySoftwareByIdentifier_DeviceTypeMatched_Returns200 Success
        Devices (7 tests) Success
          GetAll_MenderServerIsAvailable_ShouldReturnOk Success
          GetDevice_ShouldReturnError_WhenDeviceNotExists Success
          GetDevice_ShouldShowDevice_WhenDeviceExists Success
          GetDevicesByInventorySearch_TwoDevicesOneApp_DifferentValue_ShouldIdentifySingleDevice Success
          GetDevicesByInventorySearch_TwoDevicesOneApp_SameValue_ShouldIdentifyBothDevices Success
          GetDevicesByInventorySearch_TwoDevicesTwoApps_SameValue_ShouldIdentifySingleDevice Success
          Provision_ShouldAddDevice_AndLinkToCustomer Success
        Software (2 tests) Success
          GetSoftware_ShouldReturnError_WhenDeviceNotExists Success
          GetSoftware_ShouldReturnSoftware_WhenDeviceExists Success
      FrameworkTests (1 test) Success
        Accept_ShouldAddDevice_AndLinkToCustomer Success
      SoftwareControllerTests (7 tests) Success
        CreateSoftwareVersion_DuplicatedDockerArtifact_ShouldReturnBadRequest Success
        CreateSoftwareVersion_ValidDockerArtifact_ShouldReturnOk Success
        GetChannelsBySoftwareName_MultipleChannels_ShouldReturnList Success
        GetChannelsBySoftwareName_SingleChannel_ShouldReturnList Success
        GetChannelsBySoftwareName_SoftwareDoesNotExists_ShouldReturn404 Success
        GetDevicesBySoftwareIdentifier_ShouldReturnDevices_WhenDevicesExist Success
        GetSoftwareBySoftwareIdentifier_ExistingSoftwareIdentifier_ShouldReturnOk Success

Note: To prevent confusion I updated the first screenshot; I had no filtered out skipped/destructive tests which cannot be executed if our API runs in K8S.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions