Skip to content

Failure to run query that requires kms key query setting #1415

@JFvdm

Description

@JFvdm

DataProc: 2.1
bigquery-connector: spark-bigquery-with-dependencies_2.12-0.42.2.jar

Running a query in project that requires a kms key to be specified fails.

Succeeds

spark_session.conf.set('destinationTableKmsKeyName', destination_table_kms_key)
df = (
    spark_session.read.format('bigquery')
    .option('table', '<project>:<dataset>.my_table')
    .load()
    .select('*')
)

Fails

spark_session.conf.set('viewsEnabled', 'true')
spark_session.conf.set('destinationTableKmsKeyName', destination_table_kms_key)
query='SELECT * FROM `<project>.<dataset>.my_table`'
df = spark_session.read.format('bigquery')\
  .option('query', query).load()
Caused by: com.google.cloud.bigquery.connector.common.BigQueryConnectorException: Error creating destination table using the following query: [SELECT * FROM `<project>.<dataset>.my_table`]  
at com.google.cloud.bigquery.connector.common.BigQueryClient.materializeTable(BigQueryClient.java:743)        
at com.google.cloud.bigquery.connector.common.BigQueryClient.materializeQueryToTable(BigQueryClient.java:673)   
at com.google.cloud.bigquery.connector.common.BigQueryClient.getReadTable(BigQueryClient.java:441)    
at com.google.cloud.spark.bigquery.v2.context.BigQueryDataSourceReaderModule.provideDataSourceReaderContext(BigQueryDataSourceReaderModule.java:58)     
at com.google.cloud.spark.bigquery.v2.context.BigQueryDataSourceReaderModule$$FastClassByGuice$$3955632.GUICE$TRAMPOLINE(<generated>) 
at com.google.cloud.spark.bigquery.v2.context.BigQueryDataSourceReaderModule$$FastClassByGuice$$3955632.apply(<generated>)      
at com.google.cloud.spark.bigquery.repackaged.com.google.inject.internal.ProviderMethod$FastClassProviderMethod.doProvision(ProviderMethod.java:260)  
at com.google.cloud.spark.bigquery.repackaged.com.google.inject.internal.ProviderMethod.doProvision(ProviderMethod.java:171)  
at com.google.cloud.spark.bigquery.repackaged.com.google.inject.internal.InternalProviderInstanceBindingImpl$CyclicFactory.provision(InternalProviderInstanceBindingImpl.java:185)      
at com.google.cloud.spark.bigquery.repackaged.com.google.inject.internal.InternalProviderInstanceBindingImpl$CyclicFactory.get(InternalProviderInstanceBindingImpl.java:162)  
at com.google.cloud.spark.bigquery.repackaged.com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)       
at com.google.cloud.spark.bigquery.repackaged.com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:169)  
at com.google.cloud.spark.bigquery.repackaged.com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)       
at com.google.cloud.spark.bigquery.repackaged.com.google.inject.internal.InjectorImpl$1.get(InjectorImpl.java:1101)     ... 101 moreCaused by: com.google.cloud.spark.bigquery.repackaged.com.google.common.util.concurrent.UncheckedExecutionException: com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryException: Your administrator requires that you specify an encryption key for queries in project `<project>`. See https://cloud.google.com/bigquery/docs/customer-managed-encryption#services_constraint for more info.    
at com.google.cloud.spark.bigquery.repackaged.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2086)    
at com.google.cloud.spark.bigquery.repackaged.com.google.common.cache.LocalCache.get(LocalCache.java:4017)      
at com.google.cloud.spark.bigquery.repackaged.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4898)   
at com.google.cloud.bigquery.connector.common.BigQueryClient.materializeTable(BigQueryClient.java:732)  ... 114 moreCaused by: com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryException: Your administrator requires that you specify an encryption key for queries in project `<project>`. See https://cloud.google.com/bigquery/docs/customer-managed-encryption#services_constraint for more info.     
at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate(HttpBigQueryRpc.java:116)       
at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.getQueryResults(HttpBigQueryRpc.java:764)      
at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryImpl$36.call(BigQueryImpl.java:1504)    
at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryImpl$36.call(BigQueryImpl.java:1499)  
at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:102)       
at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryRetryHelper.run(BigQueryRetryHelper.java:86)  
at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryRetryHelper.runWithRetries(BigQueryRetryHelper.java:49) 
at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryImpl.getQueryResults(BigQueryImpl.java:1498)  
at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryImpl.getQueryResults(BigQueryImpl.java:1482)    
at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.Job$1.call(Job.java:390)      
at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.Job$1.call(Job.java:387)        
at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:102)     
at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryRetryHelper.run(BigQueryRetryHelper.java:86)    
at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryRetryHelper.runWithRetries(BigQueryRetryHelper.java:49)      
at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.Job.waitForQueryResults(Job.java:386)
at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.Job.waitForInternal(Job.java:281)        
at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.Job.waitFor(Job.java:202)       
at com.google.cloud.bigquery.connector.common.BigQueryClient.waitForJob(BigQueryClient.java:158)      
at com.google.cloud.bigquery.connector.common.BigQueryClient$TempTableBuilder.createTableFromQuery(BigQueryClient.java:1026)    
at com.google.cloud.bigquery.connector.common.BigQueryClient$TempTableBuilder.call(BigQueryClient.java:1014)  
at com.google.cloud.bigquery.connector.common.BigQueryClient$TempTableBuilder.call(BigQueryClient.java:992)     
at com.google.cloud.spark.bigquery.repackaged.com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4903)        
at com.google.cloud.spark.bigquery.repackaged.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3574) 
at com.google.cloud.spark.bigquery.repackaged.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2316)       
at com.google.cloud.spark.bigquery.repackaged.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2190)
at com.google.cloud.spark.bigquery.repackaged.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2080)      ... 117 moreCaused by: com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.json.GoogleJsonResponseException: 412 Precondition Failed
GET https://bigquery.googleapis.com/bigquery/v2/projects/<project>/queries/6765018c-fefa-47dc-af96-d3dded084881?location=europe-west3&maxResults=0&prettyPrint=false
{
  "code": 412,
  "errors": [
    {
      "domain": "global",
      "location": "If-Match",
      "locationType": "header",
      "message": "Your administrator requires that you specify an encryption key for queries in project `<project>`. See https://cloud.google.com/bigquery/docs/customer-managed-encryption#services_constraint for more info.",
      "reason": "conditionNotMet"
    }
  ],
  "message": "Your administrator requires that you specify an encryption key for queries in project `<project>`. See https://cloud.google.com/bigquery/docs/customer-managed-encryption#services_constraint for more info.",
  "status": "FAILED_PRECONDITION"
}
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:145)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest$3.interceptResponse(AbstractGoogleClientRequest.java:479)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:565)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:506)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:616)
        at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.getQueryResults(HttpBigQueryRpc.java:762)
        ... 141 more

Looking in the bigquery-connector repo it seems the destinationTableKmsKeyName is not used for
the query job configuration.

Expectation

Using either the of the approaches to read data from bigquery should succeed when a kms key is required, and provided.

Possible Solution

Use destinationTableKmsKeyName for the query job configuration or add an extra option to specify
a kms key for the query job.

The ability to pass any of the config in QueryJobConfiguration may be useful.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions