-
Notifications
You must be signed in to change notification settings - Fork 38
Description
What Happens?
The * wildcards that I use in a glob ( 'ABFSS/*/*.json) search have stopped working....
Since Tue 3rd Feb 2026, I have run into the following error:
I am running duckdb 1.4.4 within a Microsoft Fabric Python (3.11) Notebook. I am trying to read the files in delta using the pattern described by @djouallah here Attach_Lakehouse_v2.ipynb
I process has successfully worked for several month and started using 1.4.4 between 27-Jan to 2-Feb. Here is the code that is failing:
## The following code is derived from Mim Djouallah (https://github.com/djouallah/Fabric_Notebooks_Demo/blob/main/Attach_LH/Attach_Lakehouse_v2.ipynb)
def duckdb_attach_lakehouse(token, wks, lh, path):
con = duckdb.connect(f'temp_{time.time_ns()}.duckdb')
con.sql('SET enable_object_cache=true')
con.sql(f""" CREATE or replace SECRET onelake ( TYPE AZURE, PROVIDER ACCESS_TOKEN, ACCESS_TOKEN '{token}') """)
sql_schema = set()
sql_statements = set()
# modified to select only the ListOperatons Table
con.sql(f""" SELECT * FROM glob ("{path}/*") """).df()['file'].tolist()
list_tables = con.sql(f""" SELECT distinct(split_part(file, '_delta_log', 1)) as tables FROM glob ("{path}*/_delta_log/*.json") """).df()['tables'].tolist()
for table_path in list_tables:
parts = table_path.strip("/").split("/")
schema = parts[-2]
table = parts[-1]
sql_schema.add(f"CREATE SCHEMA IF NOT EXISTS {schema};")
sql_statements.add(f"""CREATE OR REPLACE view {schema}.{table} AS SELECT *
FROM delta_scan('abfss://{wks}@onelake.dfs.fabric.microsoft.com/{lh}/Tables/{schema}/{table}');""")
con.sql(" ".join(sql_schema))
con.sql(" ".join(sql_statements))
con.sql("SHOW ALL TABLES").show(max_width=150)
con.sql('CHECKPOINT')
return con
The list_tables step is now returning an empty list [] rather than the list of tables.
Fixes attempted
In the previous step in had the following syntax which also returns an empty list.
con.sql(f""" SELECT * FROM glob ("{path}/*") """).df()['file'].tolist()
when the wildcard is replaced with ** is works.
con.sql(f""" SELECT * FROM glob ("{path}**") """).df()['file'].tolist()
if I replace this path with a single table is works:
abfss://onelake.dfs.fabric.microsoft.com/{wks_id}/{wks_id}/{lkh_id}/Tables/Bronze/{Table Name}/_delta_log/*.json
So can't run the following:
abfss://onelake.dfs.fabric.microsoft.com/{wks_id}/{wks_id}/{lkh_id}/Tables/Bronze/*/_delta_log/*.json
Next Steps
I can refactor the code, but I was hoping I could understand why it might have suddenly stopped?