Skip to content

Commit f121e4d

Browse files
KSDaemonMaggieZhang-02MaggieZhang-01paveltiunov
authored
feat(databricks-driver): Enable Azure AD authentication via Client Secret (#9104)
* Generate credential by azure service principle * Fix lint issues * Add unit test for new env variables * Add unit test for DatabricksDriver * Add test script for databricks driver * Update azure identity to 3.2.3 * Add yarn lock * Fix jest async issue * Add azure prefix for related env variables * Keep shared key credential as default * Make sas codes more readable * Fix typo error in comments * doc(@cubejs-backend/databricks-jdbc-driver):Add new env variables to doc * Remove new Databricks variables doc changes * doc(@cubejs-backend/databricks-jdbc-driver):Add new env variables to latest doc * Keep dependencies version consistent with master * Upgrade azure indentity version * Add the changes of yarn lock for upgrade * Add yarn lock for azure identity * Fix undefined azure key error with principal provided * Add azure export bucket env variables * make unit tests not to run in CI (as it requires Java) --------- Co-authored-by: Maggie Zhang <[email protected]> Co-authored-by: Maggie <[email protected]> Co-authored-by: Pavel Tiunov <[email protected]>
1 parent 924a17c commit f121e4d

File tree

10 files changed

+382
-25
lines changed

10 files changed

+382
-25
lines changed

docs/pages/product/configuration/data-sources/databricks-jdbc.mdx

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,17 @@ CUBEJS_DB_EXPORT_BUCKET=wasbs://[email protected]
134134
CUBEJS_DB_EXPORT_BUCKET_AZURE_KEY=<AZURE_STORAGE_ACCOUNT_ACCESS_KEY>
135135
```
136136

137+
Access key provides full access to the configuration and data,
138+
to use a fine-grained control over access to storage resources, follow [the Databricks guide on authorize with Azure Active Directory][authorize-with-azure-active-directory].
139+
140+
[Create the service principal][azure-authentication-with-service-principal] and replace the access key as follows:
141+
142+
```dotenv
143+
CUBEJS_DB_EXPORT_BUCKET_AZURE_TENANT_ID=<AZURE_TENANT_ID>
144+
CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_ID=<AZURE_CLIENT_ID>
145+
CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET=<AZURE_CLIENT_SECRET>
146+
```
147+
137148
## SSL/TLS
138149

139150
Cube does not require any additional configuration to enable SSL/TLS for
@@ -150,6 +161,10 @@ bucket][self-preaggs-export-bucket] **must be** configured.
150161
[azure-bs]: https://azure.microsoft.com/en-gb/services/storage/blobs/
151162
[azure-bs-docs-get-key]:
152163
https://docs.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?toc=%2Fazure%2Fstorage%2Fblobs%2Ftoc.json&tabs=azure-portal#view-account-access-keys
164+
[authorize-with-azure-active-directory]:
165+
https://learn.microsoft.com/en-us/rest/api/storageservices/authorize-with-azure-active-directory
166+
[azure-authentication-with-service-principal]:
167+
https://learn.microsoft.com/en-us/azure/developer/java/sdk/identity-service-principal-auth
153168
[databricks]: https://databricks.com/
154169
[databricks-docs-dbfs]: https://docs.databricks.com/en/dbfs/mounts.html
155170
[databricks-docs-azure]:

docs/pages/reference/configuration/environment-variables.mdx

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -457,6 +457,66 @@ with a data source][ref-config-multiple-ds-decorating-env].
457457
| -------------------------------------- | ---------------------- | --------------------- |
458458
| [A valid AWS region][aws-docs-regions] | N/A | N/A |
459459

460+
## `CUBEJS_DB_EXPORT_BUCKET_AZURE_KEY`
461+
462+
The Azure Access Key to use for the export bucket.
463+
464+
<InfoBox>
465+
466+
When using multiple data sources, this environment variable can be [decorated
467+
with a data source][ref-config-multiple-ds-decorating-env].
468+
469+
</InfoBox>
470+
471+
| Possible Values | Default in Development | Default in Production |
472+
| ------------------------ | ---------------------- | --------------------- |
473+
| A valid Azure Access Key | N/A | N/A |
474+
475+
## `CUBEJS_DB_EXPORT_BUCKET_AZURE_TENANT_ID`
476+
477+
The Azure tenant ID to use for the export bucket.
478+
479+
<InfoBox>
480+
481+
When using multiple data sources, this environment variable can be [decorated
482+
with a data source][ref-config-multiple-ds-decorating-env].
483+
484+
</InfoBox>
485+
486+
| Possible Values | Default in Development | Default in Production |
487+
| ----------------------- | ---------------------- | --------------------- |
488+
| A valid Azure Tenant ID | N/A | N/A |
489+
490+
## `CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_ID`
491+
492+
The Azure client ID to use for the export bucket.
493+
494+
<InfoBox>
495+
496+
When using multiple data sources, this environment variable can be [decorated
497+
with a data source][ref-config-multiple-ds-decorating-env].
498+
499+
</InfoBox>
500+
501+
| Possible Values | Default in Development | Default in Production |
502+
| ----------------------- | ---------------------- | --------------------- |
503+
| A valid Azure Client ID | N/A | N/A |
504+
505+
## `CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET`
506+
507+
The Azure client secret to use for the export bucket.
508+
509+
<InfoBox>
510+
511+
When using multiple data sources, this environment variable can be [decorated
512+
with a data source][ref-config-multiple-ds-decorating-env].
513+
514+
</InfoBox>
515+
516+
| Possible Values | Default in Development | Default in Production |
517+
| --------------------------- | ---------------------- | --------------------- |
518+
| A valid Azure Client Secret | N/A | N/A |
519+
460520
## `CUBEJS_DB_EXPORT_BUCKET_MOUNT_DIR`
461521

462522
The mount path to use for a [Databricks DBFS mount][databricks-docs-dbfs].

packages/cubejs-backend-shared/src/env.ts

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -795,6 +795,19 @@ const variables: Record<string, (...args: any) => any> = {
795795
]
796796
),
797797

798+
/**
799+
* Client Secret for the Azure based export bucket storage.
800+
*/
801+
dbExportBucketAzureClientSecret: ({
802+
dataSource,
803+
}: {
804+
dataSource: string,
805+
}) => (
806+
process.env[
807+
keyByDataSource('CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET', dataSource)
808+
]
809+
),
810+
798811
/**
799812
* Azure Federated Token File Path for the Azure based export bucket storage.
800813
*/

packages/cubejs-backend-shared/test/db_env_multi.test.ts

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -956,6 +956,93 @@ describe('Multiple datasources', () => {
956956
);
957957
});
958958

959+
test('getEnv("dbExportBucketAzureTenantId")', () => {
960+
process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_TENANT_ID = 'default1';
961+
process.env.CUBEJS_DS_POSTGRES_DB_EXPORT_BUCKET_AZURE_TENANT_ID = 'postgres1';
962+
process.env.CUBEJS_DS_WRONG_DB_EXPORT_BUCKET_AZURE_TENANT_ID = 'wrong1';
963+
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'default' })).toEqual('default1');
964+
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'postgres' })).toEqual('postgres1');
965+
expect(() => getEnv('dbExportBucketAzureTenantId', { dataSource: 'wrong' })).toThrow(
966+
'The wrong data source is missing in the declared CUBEJS_DATASOURCES.'
967+
);
968+
969+
process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_TENANT_ID = 'default2';
970+
process.env.CUBEJS_DS_POSTGRES_DB_EXPORT_BUCKET_AZURE_TENANT_ID = 'postgres2';
971+
process.env.CUBEJS_DS_WRONG_DB_EXPORT_BUCKET_AZURE_TENANT_ID = 'wrong2';
972+
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'default' })).toEqual('default2');
973+
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'postgres' })).toEqual('postgres2');
974+
expect(() => getEnv('dbExportBucketAzureTenantId', { dataSource: 'wrong' })).toThrow(
975+
'The wrong data source is missing in the declared CUBEJS_DATASOURCES.'
976+
);
977+
978+
delete process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_TENANT_ID;
979+
delete process.env.CUBEJS_DS_POSTGRES_DB_EXPORT_BUCKET_AZURE_TENANT_ID;
980+
delete process.env.CUBEJS_DS_WRONG_DB_EXPORT_BUCKET_AZURE_TENANT_ID;
981+
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'default' })).toBeUndefined();
982+
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'postgres' })).toBeUndefined();
983+
expect(() => getEnv('dbExportBucketAzureTenantId', { dataSource: 'wrong' })).toThrow(
984+
'The wrong data source is missing in the declared CUBEJS_DATASOURCES.'
985+
);
986+
});
987+
988+
test('getEnv("dbExportBucketAzureClientId")', () => {
989+
process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_ID = 'default1';
990+
process.env.CUBEJS_DS_POSTGRES_DB_EXPORT_BUCKET_AZURE_CLIENT_ID = 'postgres1';
991+
process.env.CUBEJS_DS_WRONG_DB_EXPORT_BUCKET_AZURE_CLIENT_ID = 'wrong1';
992+
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'default' })).toEqual('default1');
993+
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'postgres' })).toEqual('postgres1');
994+
expect(() => getEnv('dbExportBucketAzureClientId', { dataSource: 'wrong' })).toThrow(
995+
'The wrong data source is missing in the declared CUBEJS_DATASOURCES.'
996+
);
997+
998+
process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_ID = 'default2';
999+
process.env.CUBEJS_DS_POSTGRES_DB_EXPORT_BUCKET_AZURE_CLIENT_ID = 'postgres2';
1000+
process.env.CUBEJS_DS_WRONG_DB_EXPORT_BUCKET_AZURE_CLIENT_ID = 'wrong2';
1001+
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'default' })).toEqual('default2');
1002+
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'postgres' })).toEqual('postgres2');
1003+
expect(() => getEnv('dbExportBucketAzureClientId', { dataSource: 'wrong' })).toThrow(
1004+
'The wrong data source is missing in the declared CUBEJS_DATASOURCES.'
1005+
);
1006+
1007+
delete process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_ID;
1008+
delete process.env.CUBEJS_DS_POSTGRES_DB_EXPORT_BUCKET_AZURE_CLIENT_ID;
1009+
delete process.env.CUBEJS_DS_WRONG_DB_EXPORT_BUCKET_AZURE_CLIENT_ID;
1010+
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'default' })).toBeUndefined();
1011+
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'postgres' })).toBeUndefined();
1012+
expect(() => getEnv('dbExportBucketAzureClientId', { dataSource: 'wrong' })).toThrow(
1013+
'The wrong data source is missing in the declared CUBEJS_DATASOURCES.'
1014+
);
1015+
});
1016+
1017+
test('getEnv("dbExportBucketAzureClientSecret")', () => {
1018+
process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET = 'default1';
1019+
process.env.CUBEJS_DS_POSTGRES_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET = 'postgres1';
1020+
process.env.CUBEJS_DS_WRONG_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET = 'wrong1';
1021+
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'default' })).toEqual('default1');
1022+
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'postgres' })).toEqual('postgres1');
1023+
expect(() => getEnv('dbExportBucketAzureClientSecret', { dataSource: 'wrong' })).toThrow(
1024+
'The wrong data source is missing in the declared CUBEJS_DATASOURCES.'
1025+
);
1026+
1027+
process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET = 'default2';
1028+
process.env.CUBEJS_DS_POSTGRES_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET = 'postgres2';
1029+
process.env.CUBEJS_DS_WRONG_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET = 'wrong2';
1030+
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'default' })).toEqual('default2');
1031+
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'postgres' })).toEqual('postgres2');
1032+
expect(() => getEnv('dbExportBucketAzureClientSecret', { dataSource: 'wrong' })).toThrow(
1033+
'The wrong data source is missing in the declared CUBEJS_DATASOURCES.'
1034+
);
1035+
1036+
delete process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET;
1037+
delete process.env.CUBEJS_DS_POSTGRES_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET;
1038+
delete process.env.CUBEJS_DS_WRONG_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET;
1039+
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'default' })).toBeUndefined();
1040+
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'postgres' })).toBeUndefined();
1041+
expect(() => getEnv('dbExportBucketAzureClientSecret', { dataSource: 'wrong' })).toThrow(
1042+
'The wrong data source is missing in the declared CUBEJS_DATASOURCES.'
1043+
);
1044+
});
1045+
9591046
test('getEnv("dbExportIntegration")', () => {
9601047
process.env.CUBEJS_DB_EXPORT_INTEGRATION = 'default1';
9611048
process.env.CUBEJS_DS_POSTGRES_DB_EXPORT_INTEGRATION = 'postgres1';

packages/cubejs-backend-shared/test/db_env_single.test.ts

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -618,6 +618,57 @@ describe('Single datasources', () => {
618618
expect(getEnv('dbExportBucketAzureKey', { dataSource: 'wrong' })).toBeUndefined();
619619
});
620620

621+
test('getEnv("dbExportBucketAzureTenantId")', () => {
622+
process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_TENANT_ID = 'default1';
623+
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'default' })).toEqual('default1');
624+
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'postgres' })).toEqual('default1');
625+
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'wrong' })).toEqual('default1');
626+
627+
process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_TENANT_ID = 'default2';
628+
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'default' })).toEqual('default2');
629+
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'postgres' })).toEqual('default2');
630+
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'wrong' })).toEqual('default2');
631+
632+
delete process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_TENANT_ID;
633+
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'default' })).toBeUndefined();
634+
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'postgres' })).toBeUndefined();
635+
expect(getEnv('dbExportBucketAzureTenantId', { dataSource: 'wrong' })).toBeUndefined();
636+
});
637+
638+
test('getEnv("dbExportBucketAzureClientId")', () => {
639+
process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_ID = 'default1';
640+
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'default' })).toEqual('default1');
641+
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'postgres' })).toEqual('default1');
642+
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'wrong' })).toEqual('default1');
643+
644+
process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_ID = 'default2';
645+
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'default' })).toEqual('default2');
646+
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'postgres' })).toEqual('default2');
647+
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'wrong' })).toEqual('default2');
648+
649+
delete process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_ID;
650+
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'default' })).toBeUndefined();
651+
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'postgres' })).toBeUndefined();
652+
expect(getEnv('dbExportBucketAzureClientId', { dataSource: 'wrong' })).toBeUndefined();
653+
});
654+
655+
test('getEnv("dbExportBucketAzureClientSecret")', () => {
656+
process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET = 'default1';
657+
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'default' })).toEqual('default1');
658+
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'postgres' })).toEqual('default1');
659+
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'wrong' })).toEqual('default1');
660+
661+
process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET = 'default2';
662+
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'default' })).toEqual('default2');
663+
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'postgres' })).toEqual('default2');
664+
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'wrong' })).toEqual('default2');
665+
666+
delete process.env.CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET;
667+
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'default' })).toBeUndefined();
668+
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'postgres' })).toBeUndefined();
669+
expect(getEnv('dbExportBucketAzureClientSecret', { dataSource: 'wrong' })).toBeUndefined();
670+
});
671+
621672
test('getEnv("dbExportIntegration")', () => {
622673
process.env.CUBEJS_DB_EXPORT_INTEGRATION = 'default1';
623674
expect(getEnv('dbExportIntegration', { dataSource: 'default' })).toEqual('default1');

packages/cubejs-base-driver/src/BaseDriver.ts

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ import {
2727
} from '@azure/storage-blob';
2828
import {
2929
DefaultAzureCredential,
30+
ClientSecretCredential,
3031
} from '@azure/identity';
3132

3233
import { cancelCombinator } from './utils';
@@ -73,6 +74,15 @@ export type AzureStorageClientConfig = {
7374
* the Azure library will try to use the AZURE_TENANT_ID env
7475
*/
7576
tenantId?: string,
77+
/**
78+
* Azure service principal client secret.
79+
* Enables authentication to Microsoft Entra ID using a client secret that was generated
80+
* for an App Registration. More information on how to configure a client secret can be found here:
81+
* https://learn.microsoft.com/entra/identity-platform/quickstart-configure-app-access-web-apis#add-credentials-to-your-web-application
82+
* In case of DefaultAzureCredential flow if it is omitted
83+
* the Azure library will try to use the AZURE_CLIENT_SECRET env
84+
*/
85+
clientSecret?: string,
7686
/**
7787
* The path to a file containing a Kubernetes service account token that authenticates the identity.
7888
* In case of DefaultAzureCredential flow if it is omitted
@@ -760,7 +770,7 @@ export abstract class BaseDriver implements DriverInterface {
760770
const parts = bucketName.split(splitter);
761771
const account = parts[0];
762772
const container = parts[1].split('/')[0];
763-
let credential: StorageSharedKeyCredential | DefaultAzureCredential;
773+
let credential: StorageSharedKeyCredential | ClientSecretCredential | DefaultAzureCredential;
764774
let blobServiceClient: BlobServiceClient;
765775
let getSas;
766776

@@ -778,6 +788,28 @@ export abstract class BaseDriver implements DriverInterface {
778788
},
779789
credential as StorageSharedKeyCredential
780790
).toString();
791+
} else if (azureConfig.clientSecret && azureConfig.tenantId && azureConfig.clientId) {
792+
credential = new ClientSecretCredential(
793+
azureConfig.tenantId,
794+
azureConfig.clientId,
795+
azureConfig.clientSecret,
796+
);
797+
getSas = async (name: string, startsOn: Date, expiresOn: Date) => {
798+
const userDelegationKey = await blobServiceClient.getUserDelegationKey(startsOn, expiresOn);
799+
return generateBlobSASQueryParameters(
800+
{
801+
containerName: container,
802+
blobName: name,
803+
permissions: ContainerSASPermissions.parse('r'),
804+
startsOn,
805+
expiresOn,
806+
protocol: SASProtocol.Https,
807+
version: '2020-08-04',
808+
},
809+
userDelegationKey,
810+
account
811+
).toString();
812+
};
781813
} else {
782814
const opts = {
783815
tenantId: azureConfig.tenantId,

packages/cubejs-databricks-jdbc-driver/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ $ yarn
2020
$ yarn test
2121
```
2222

23+
Note: Unit tests requires Java to be installed.
24+
2325
### License
2426

2527
Cube.js Databricks Database Driver is [Apache 2.0 licensed](./LICENSE).

packages/cubejs-databricks-jdbc-driver/package.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@
1818
"build": "rm -rf dist && npm run tsc",
1919
"tsc": "tsc",
2020
"watch": "tsc -w",
21+
"test": "npm run unit-tests",
22+
"unit-tests": "jest dist/test --forceExit",
2123
"lint": "eslint src/* --ext .ts",
2224
"lint:fix": "eslint --fix src/* --ext .ts",
2325
"postinstall": "node bin/post-install"

0 commit comments

Comments
 (0)