Skip to content

When Connecting to Databricks the First Request Fails #156

@marcelo-g-simas

Description

@marcelo-g-simas

Due to the time it takes for a SQL Warehouse to start it is possible to get timeout errors on the first connection attempt. For example, when trying to create a table() instance to use in a chain I got this error on a cold start:

julia> table = dt(con, "all_years_data")
ERROR: KeyError: key :manifest not found
Stacktrace:
 [1] getindex(h::Dict{Symbol, Int64}, key::Symbol)
   @ Base ./dict.jl:477
 [2] get(obj::JSON3.Object{Base.CodeUnits{UInt8, String}, Vector{UInt64}}, key::Symbol)
   @ JSON3 ~/.julia/packages/JSON3/rT1w2/src/JSON3.jl:87
 [3] getproperty(obj::JSON3.Object{Base.CodeUnits{UInt8, String}, Vector{UInt64}}, prop::Symbol)
   @ JSON3 ~/.julia/packages/JSON3/rT1w2/src/JSON3.jl:127
 [4] execute_databricks(conn::TidierDB.DatabricksConnection, query::String)
   @ TidierDB ~/.julia/packages/TidierDB/SKJ5D/src/parsing_databricks.jl:38
 [5] get_table_metadata(conn::TidierDB.DatabricksConnection, table_name::String)
   @ TidierDB ~/.julia/packages/TidierDB/SKJ5D/src/parsing_databricks.jl:59
 [6] 
   @ TidierDB ~/.julia/packages/TidierDB/SKJ5D/src/TidierDB.jl:198
 [7] db_table
   @ ~/.julia/packages/TidierDB/SKJ5D/src/TidierDB.jl:156 [inlined]
 [8] db_table(db::TidierDB.DatabricksConnection, table::String)
   @ TidierDB ~/.julia/packages/TidierDB/SKJ5D/src/TidierDB.jl:156
 [9] top-level scope
   @ REPL[9]:1
Some type information was truncated. Use `show(err)` to see complete types.

This happens because Databricks returns a status of PENDING If it does not get a response from the SQL Warehouse within its timeout (https://api-reference.cloud.databricks.com/workspace/statementexecution/executestatement#wait_timeout).

I've also noticed that the current implementation does not explicitly request INLINE results and does not support downloading multi-chunk results or the EXTERNAL_LINKS data return method which are used for larger result sets.

I have these features working on a fork and will submit a PR with those changes. Please note that I did use ChatGPT and Claude Haiku to help me get the code together.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions