-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat(databricks-jdbc-driver): Add export bucket support for Google Cloud Storage #9407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
fc5097b
a789514
180093d
7c0503f
7df6bac
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -26,6 +26,8 @@ import { | |
| import { DatabricksQuery } from './DatabricksQuery'; | ||
| import { resolveJDBCDriver, extractUidFromJdbcUrl } from './helpers'; | ||
|
|
||
| const SUPPORTED_BUCKET_TYPES = ['s3', 'gcs', 'azure']; | ||
|
|
||
| export type DatabricksDriverConfiguration = JDBCDriverConfiguration & | ||
| { | ||
| /** | ||
|
|
@@ -103,6 +105,11 @@ export type DatabricksDriverConfiguration = JDBCDriverConfiguration & | |
| * Azure service principal client secret | ||
| */ | ||
| azureClientSecret?: string, | ||
|
|
||
| /** | ||
| * GCS credentials JSON content | ||
| */ | ||
| gcsCredentials?: string, | ||
| }; | ||
|
|
||
| type ShowTableRow = { | ||
|
|
@@ -209,7 +216,7 @@ export class DatabricksDriver extends JDBCDriver { | |
| // common export bucket config | ||
| bucketType: | ||
| conf?.bucketType || | ||
| getEnv('dbExportBucketType', { supported: ['s3', 'azure'], dataSource }), | ||
| getEnv('dbExportBucketType', { supported: SUPPORTED_BUCKET_TYPES, dataSource }), | ||
| exportBucket: | ||
| conf?.exportBucket || | ||
| getEnv('dbExportBucket', { dataSource }), | ||
|
|
@@ -246,6 +253,10 @@ export class DatabricksDriver extends JDBCDriver { | |
| azureClientSecret: | ||
| conf?.azureClientSecret || | ||
| getEnv('dbExportBucketAzureClientSecret', { dataSource }), | ||
| // GCS credentials | ||
| gcsCredentials: | ||
| conf?.gcsCredentials || | ||
| getEnv('dbExportGCSCredentials', { dataSource }), | ||
| }; | ||
| if (config.readOnly === undefined) { | ||
| // we can set readonly to true if there is no bucket config provided | ||
|
|
@@ -643,7 +654,7 @@ export class DatabricksDriver extends JDBCDriver { | |
| * export bucket data. | ||
| */ | ||
| public async unload(tableName: string, options: UnloadOptions) { | ||
| if (!['azure', 's3'].includes(this.config.bucketType as string)) { | ||
| if (!SUPPORTED_BUCKET_TYPES.includes(this.config.bucketType as string)) { | ||
| throw new Error(`Unsupported export bucket type: ${ | ||
| this.config.bucketType | ||
| }`); | ||
|
|
@@ -733,6 +744,15 @@ export class DatabricksDriver extends JDBCDriver { | |
| url.host, | ||
| objectSearchPrefix, | ||
| ); | ||
| } else if (this.config.bucketType === 'gcs') { | ||
| return this.extractFilesFromGCS( | ||
| { credentials: this.config.gcsCredentials }, | ||
| url.host, | ||
| objectSearchPrefix+".csv", | ||
| ).then(files => files.filter(file => | ||
| decodeURIComponent(new URL(file).pathname).endsWith('.csv') || | ||
| decodeURIComponent(new URL(file).pathname).endsWith('.csv.gz') | ||
|
Comment on lines
+753
to
+754
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The answers to the questions above might affect these lines. Are they needed?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes they are needed since those files will also be returned from
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this should not be a problem. Anyway - if we want to make some filtering - this should be done in |
||
| )); | ||
| } else { | ||
| throw new Error(`Unsupported export bucket type: ${ | ||
| this.config.bucketType | ||
|
|
@@ -769,6 +789,9 @@ export class DatabricksDriver extends JDBCDriver { | |
| * | ||
| * `fs.s3a.access.key <aws-access-key>` | ||
| * `fs.s3a.secret.key <aws-secret-key>` | ||
| * For Google cloud storage you can configure storage credentials and create an external location to access it | ||
| * (https://docs.databricks.com/gcp/en/connect/unity-catalog/cloud-storage/storage-credentials | ||
| * https://docs.databricks.com/gcp/en/connect/unity-catalog/cloud-storage/external-locations) | ||
| */ | ||
| private async createExternalTableFromSql(tableFullName: string, sql: string, params: unknown[], columns: ColumnInfo[]) { | ||
| let select = sql; | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it important to add
.csvhere? Are there any other unrelated files that might be captured byobjectSearchPrefixthat should be excluded?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having this:
Does Databricks create a folder for a table with a
csvsuffix? Likemy_exported_table.cvs/with a bunch of real csv files?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like this:
It's kind of weird actually since the folder name has a .csv suffix in it, like
table-name.csv/There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we don't need the csv suffix in the
LOCATION? But i don't have a databricks-aws env to test if it's the same behavior as exporting to GCS.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think your're right. We should remove
csvposstfix from the location.