Skip to content

ABFSS Select From glob not recognising * wildcards #278

@daryl-lynch-bzy

Description

@daryl-lynch-bzy

What Happens?

The * wildcards that I use in a glob ( 'ABFSS/*/*.json) search have stopped working....

Since Tue 3rd Feb 2026, I have run into the following error:

Image

I am running duckdb 1.4.4 within a Microsoft Fabric Python (3.11) Notebook. I am trying to read the files in delta using the pattern described by @djouallah here Attach_Lakehouse_v2.ipynb

I process has successfully worked for several month and started using 1.4.4 between 27-Jan to 2-Feb. Here is the code that is failing:

## The following code is derived from Mim Djouallah (https://github.com/djouallah/Fabric_Notebooks_Demo/blob/main/Attach_LH/Attach_Lakehouse_v2.ipynb)

def duckdb_attach_lakehouse(token, wks, lh, path):
    con = duckdb.connect(f'temp_{time.time_ns()}.duckdb')
    con.sql('SET enable_object_cache=true')
    con.sql(f""" CREATE or replace SECRET onelake ( TYPE AZURE, PROVIDER ACCESS_TOKEN, ACCESS_TOKEN '{token}')   """)
    
    sql_schema     = set()
    sql_statements = set()
    
    # modified to select only the ListOperatons Table
    con.sql(f""" SELECT  * FROM glob ("{path}/*") """).df()['file'].tolist()
    list_tables = con.sql(f""" SELECT  distinct(split_part(file, '_delta_log', 1)) as tables FROM glob ("{path}*/_delta_log/*.json") """).df()['tables'].tolist()
    for table_path in list_tables:
            parts = table_path.strip("/").split("/")
            schema = parts[-2]
            table = parts[-1]
            sql_schema.add(f"CREATE SCHEMA IF NOT EXISTS {schema};")
            sql_statements.add(f"""CREATE OR REPLACE view {schema}.{table} AS SELECT * 
                                FROM delta_scan('abfss://{wks}@onelake.dfs.fabric.microsoft.com/{lh}/Tables/{schema}/{table}');""")
    con.sql(" ".join(sql_schema))
    con.sql(" ".join(sql_statements))
    con.sql("SHOW ALL TABLES").show(max_width=150)
    con.sql('CHECKPOINT')
    return con

The list_tables step is now returning an empty list [] rather than the list of tables.

Fixes attempted

In the previous step in had the following syntax which also returns an empty list.

con.sql(f""" SELECT * FROM glob ("{path}/*") """).df()['file'].tolist()

when the wildcard is replaced with ** is works.

con.sql(f""" SELECT * FROM glob ("{path}**") """).df()['file'].tolist()

if I replace this path with a single table is works:

abfss://onelake.dfs.fabric.microsoft.com/{wks_id}/{wks_id}/{lkh_id}/Tables/Bronze/{Table Name}/_delta_log/*.json

So can't run the following:

abfss://onelake.dfs.fabric.microsoft.com/{wks_id}/{wks_id}/{lkh_id}/Tables/Bronze/*/_delta_log/*.json

Next Steps

I can refactor the code, but I was hoping I could understand why it might have suddenly stopped?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions