-
Notifications
You must be signed in to change notification settings - Fork 42
Description
Hellou again.
As discussed in #1095, we wanted to share our findings running Mender against a managed Azure MongoDB (DocumentDB) instance.
I finally found some time to test the fix in DeviceAuth - and I'm happy to share:
It works!! (somewhat).
DocumentDB Setup
| Setting | Value |
|---|---|
| Cluster tier | M10 tier, 1 vCores, 2 GiB RAM |
| Shard Count | 1 |
| High Availability | No |
| MongoDB Version | 8.0 |
Problem
We were able to successfully* run Mender (with the new device_auth image 4.0.X) without modifications to any of the images.
However, as expected, we needed to adjust the two (sparse) indexes in device auth:
mongosh "mongodb+srv://$user@$mongoInstance.global.mongocluster.cosmos.azure.com/?tls=true&authMechanism=SCRAM-SHA-256&retrywrites=false&maxIdleTimeMS=120000"
# Provide Password when prompted
showDbs
use deviceauth
db.getCollectionNames()
db.auth_sets.getIndexes()By default, mender will create 3 indexes on auth_sets, specifically the last command returns:
[
{ v: 2, key: { _id: 1 }, name: '_id_' },
{
v: 2,
key: { tenant_id: 1, id_data_sha256: 1, pubkey: 1 },
name: 'tenant_id_id_data_sha256_pubkey',
unique: true,
storageEngine: { enableOrderedIndex: true }
},
{
v: 2,
key: { tenant_id: 1, device_id: 1 },
name: 'tenant_id_device_id',
storageEngine: { enableOrderedIndex: true }
}
]Since we are running the community version of mender, we do not have any tenant_id, which will make the device_auth crash.
I think (!) tenant_id will always be empty string and this does not work without having sparse indices available.
I tried to find a workaround by setting various environment variables (which are quite likely not intended to be used like this ^^),
but a mixture of trial+error and looking into the respective code places makes me believe that tenant_id will always be empty string in the community helm chart, no matter what I do. (To be clear: I am not trying to cheat myself to use non-community functionality, I tried to make it work without patching the indices).
Patching the Indices
db.auth_sets.dropIndexes("*")
db.auth_sets.createIndex({ id_data_sha256: 1, pubkey: 1 }, { unique: true, name: 'tenant_id_id_data_sha256_pubkey' })
db.auth_sets.createIndex({ device_id: 1 }, { unique: true, name: 'tenant_id_device_id' })
As indicated above, we recreated the indices with a matching name, just without including the tenant_id field.
I was not sure if there is any reference (by name) on the index in the code, so this seemed like the smart thing to do.
Test Suite
While logging in worked even before the device_auth patch, our test-suite would fatally crash because DocumentDB returns a different error message than native mongo.. But now:
All green! 💥
To summarize/give an overview what the tests are checking on mender side I dumped the requests and created a pivot table (would be happy to share it privately, in case more information is required).
On the left are the callouts against mender by our API/Production code, on the right are the test setup callouts. Our test-setup mostly involved setting up (simulated) devices and uploading dummy-artifacts to deploy later:
@alfrunes In #1095 you mentioned that Mender utilizes sparse indexes - tbh, I would have expected more problems.
However, needless to say: We are not utilizing all of menders functionality, so I have no clue what was potentially missed.
To move forward:
- Are there things you want to have specifically tested?
- Is there any information missing that we should share?
- Would it be possible to remove the tenant_id from the index in the community version?
I am not sure if there is the plan to remove this altogether with the recent removal of enterprise related code from the public repo.
To summarize: The reference to tenant_id in the two auth_sets seems to be the only problem in our use case and we would be happy to support finding/implementing a solution except deleting and recreating the index.
Thank you so much!!
Cheers & Happy Christmas
Olli
Edit: Pinged the wrong dev.. I'm so sorry!
Edit2: For completeness here are the descriptions of our test cases (I removed some uninteresting ones), but hopefully the callout list is sufficient
Test Case Descriptors
Test (108 tests) Success
RestApi (108 tests) Success
RestApi (108 tests) Success
ApiSurface (17 tests) Success
ApiSurfaceTests (17 tests) Success
DeploymentTests (2 tests) Success
ApiSurfaceTests.SanityCheck_VerifyConfiguredCorrectly Success
Retrieve Finished Deployment Success
DeviceTests (5 tests) Success
ApiSurfaceTests.SanityCheck_VerifyConfiguredCorrectly Success
Deploy Device with Artifact Type=App Success
Get Device by Resource Attributes (Multiple Devices) Success
Get Device in Provisioning State=Accepted Success
Get Software of Device with Artifact Type=App Success
SanityCheck_VerifyConfiguredCorrectly Success
SoftwareTests (9 tests) Success
ApiSurfaceTests.SanityCheck_VerifyConfiguredCorrectly Success
Create Software Version with Type=App Success
Create Software Version with Type=Single-File Success
Get Channels by Software Name MultipleChannels_MultiVersion Success
Get Channels by Software Name MultipleChannels_SingleVersion Success
Get Channels by Software Name SingleChannel Success
Get Devices By Software with Type=App Success
Get Software by Identifier with Type=App Success
Verify Artifact with Type=App Success
Framework.Tests (1 test) Success
SimulatedClientTests (1 test) Success
SimulatedClient_AfterAccept_ShouldPopulateInventory Success
NonFunctional (5 tests) Success
ResourceSynchronizationTests (5 tests) Success
ShouldSynchronizeAccessToDeviceConfiguration_TableTests (4 tests) Success
ShouldSynchronizeAccessToDeviceConfiguration_TableTests(applicationCount: 1) Success
ShouldSynchronizeAccessToDeviceConfiguration_TableTests(applicationCount: 10) Success
ShouldSynchronizeAccessToDeviceConfiguration_TableTests(applicationCount: 2) Success
ShouldSynchronizeAccessToDeviceConfiguration_TableTests(applicationCount: 5) Success
ShouldThinkOfABetterNameLater Success
ArtifactControllerTests (12 tests) Success
Negative (10 tests) Success
DockerApp (10 tests) Success
ShouldReturn400ForInvalidSoftwareChannel_TableTests (5 tests) Success
ShouldReturn400ForInvalidSoftwareChannel_TableTests(softwareChannel: "123456789012345678901234567890123") Success
ShouldReturn400ForInvalidSoftwareChannel_TableTests(softwareChannel: "software_channel") Success
ShouldReturn400ForInvalidSoftwareChannel_TableTests(softwareChannel: "software-channel") Success
ShouldReturn400ForInvalidSoftwareChannel_TableTests(softwareChannel: "software#channel") Success
ShouldReturn400ForInvalidSoftwareChannel_TableTests(softwareChannel: "software$channel") Success
ShouldReturn400ForInvalidSoftwareName_TableTests (5 tests) Success
ShouldReturn400ForInvalidSoftwareName_TableTests(softwareName: "12345678901234567890123456789012345678901234567890"···) Success
ShouldReturn400ForInvalidSoftwareName_TableTests(softwareName: "software;name") Success
ShouldReturn400ForInvalidSoftwareName_TableTests(softwareName: "software:name") Success
ShouldReturn400ForInvalidSoftwareName_TableTests(softwareName: "software#name") Success
ShouldReturn400ForInvalidSoftwareName_TableTests(softwareName: "software$name") Success
Positive (2 tests) Success
CreateArtifact_Handles (2 tests) Success
CreateArtifact_Handles(type: Docker) Success
CreateArtifact_Handles(type: SingleFile) Success
AuthControllerTests (3 tests) Success
LoginAsync_InvalidPassword_ShouldReturnUnauthorized Success
LoginAsync_UnknownUser_ShouldReturnUnauthorized Success
LoginAsync_ValidUser_ShouldSucceed Success
DeploymentControllerTests (5 tests) Success
Negative (2 tests) Success
GetDeployment_IdIsGuidEmpty_ShouldReturn400 Success
GetDeployment_IdIsUnparseable_ShouldReturn400 Success
Positive (3 tests) Success
GetDeployment_FinishedDeployment_ShouldHaveTypeSoftware Success
GetDeployment_FinishedDeployment_ShouldReturn200 Success
GetDeployment_PendingDeployment_ShouldReturn200 Success
DeviceAuthControllerTests (4 tests) Success
ListDevicesWithQueryPending_ShouldShowPendingDevice_WhenDeviceExists Success
SetProvision_ShouldFail_WhenDeviceNotFound Success
SetProvision_ShouldProvisionDevice_WhenDeviceExists Success
SetProvision_ShouldReturn400_WhenCustomerMissing Success
DeviceControllerTests (14 tests) Success
Configuration (4 tests) Success
GetConfigurationStatusByIdentifier_ShouldReturnStatusEnum_WhenSoftwareConfirmed Success
GetConfigurationStatusByIdentifier_ShouldReturnStatusEnum_WhenSoftwareNotConfirmedYet Success
UpdateSoftwareConfigurationByDescriptor_ShouldReturnNoContent_WhenUpdateSuccessful Success
UpdateSoftwareConfigurationByIdentifier_ShouldReturnNoContent_WhenUpdateSuccessful Success
Deployment (1 test) Success
DeploySoftwareByIdentifier_DeviceTypeMatched_Returns200 Success
Devices (7 tests) Success
GetAll_MenderServerIsAvailable_ShouldReturnOk Success
GetDevice_ShouldReturnError_WhenDeviceNotExists Success
GetDevice_ShouldShowDevice_WhenDeviceExists Success
GetDevicesByInventorySearch_TwoDevicesOneApp_DifferentValue_ShouldIdentifySingleDevice Success
GetDevicesByInventorySearch_TwoDevicesOneApp_SameValue_ShouldIdentifyBothDevices Success
GetDevicesByInventorySearch_TwoDevicesTwoApps_SameValue_ShouldIdentifySingleDevice Success
Provision_ShouldAddDevice_AndLinkToCustomer Success
Software (2 tests) Success
GetSoftware_ShouldReturnError_WhenDeviceNotExists Success
GetSoftware_ShouldReturnSoftware_WhenDeviceExists Success
FrameworkTests (1 test) Success
Accept_ShouldAddDevice_AndLinkToCustomer Success
SoftwareControllerTests (7 tests) Success
CreateSoftwareVersion_DuplicatedDockerArtifact_ShouldReturnBadRequest Success
CreateSoftwareVersion_ValidDockerArtifact_ShouldReturnOk Success
GetChannelsBySoftwareName_MultipleChannels_ShouldReturnList Success
GetChannelsBySoftwareName_SingleChannel_ShouldReturnList Success
GetChannelsBySoftwareName_SoftwareDoesNotExists_ShouldReturn404 Success
GetDevicesBySoftwareIdentifier_ShouldReturnDevices_WhenDevicesExist Success
GetSoftwareBySoftwareIdentifier_ExistingSoftwareIdentifier_ShouldReturnOk Success
Note: To prevent confusion I updated the first screenshot; I had no filtered out skipped/destructive tests which cannot be executed if our API runs in K8S.