-
Notifications
You must be signed in to change notification settings - Fork 248
Description
Soda core uses databricks.sql.connect for authentication, which offer many options, as documented:
Unfortunately, the way this is implemented by soda.data_sources.spark_data_source.databricks_connection_function limits it to personal access tokens:
soda-core/soda/spark/soda/data_sources/spark_data_source.py
Lines 136 to 149 in 09262b0
| def databricks_connection_function(host: str, http_path: str, token: str, database: str, schema: str, **kwargs): | |
| from databricks import sql | |
| user_agent_entry = f"soda-core-spark/{SODA_CORE_VERSION} (Databricks)" | |
| logging.getLogger("databricks.sql").setLevel(logging.INFO) | |
| connection = sql.connect( | |
| server_hostname=host, | |
| catalog=database, | |
| schema=schema, | |
| http_path=http_path, | |
| access_token=token, | |
| _user_agent_entry=user_agent_entry, | |
| ) | |
| return connection |
Likewise in SparkDataSource:
soda-core/soda/spark/soda/data_sources/spark_data_source.py
Lines 474 to 491 in 09262b0
| connection = connection_function( | |
| username=self.username, | |
| password=self.password, | |
| host=self.host, | |
| port=self.port, | |
| database=self.database, | |
| auth_method=self.auth_method, | |
| kerberos_service_name=self.kerberos_service_name, | |
| driver=self.driver, | |
| token=self.token, | |
| schema=self.schema, | |
| http_path=self.http_path, | |
| organization=self.organization, | |
| cluster=self.cluster, | |
| server_side_parameters=self.server_side_parameters, | |
| configuration=self.configuration, | |
| scheme=self.scheme, | |
| ) |
A solution could be to extend the signature of databricks_connection_function to match databricks.sql.connect, for example:
def databricks_connection_function(
host: str,
http_path: str,
database: str,
schema: str,
auth_type: Literal["databricks-oauth"] | None = None,
token: str | None = None,
username: str | None = None,
password: str | None = None,
client_id: str | None = None,
client_secret: str | None = None,
):
...These could then be sent trough to databricks.sql.connect (with the exception of client_id and client_secret which require the creation of a credentials provider if defined).
Adding these options (in particular OAuth) would allow much more secure and robust connection alternatives!