Skip to content

Conversation

NickCrews
Copy link
Contributor

@NickCrews NickCrews commented May 9, 2025

The sqlite_attach() approach is deprecated per
https://duckdb.org/docs/stable/extensions/sqlite.html,
and the newer syntax opens up read only flag, specifiying a name for the attachment, not overwriting tables that already exist.

This is breaking:

  • some params are now keyword-only
  • the semantics for attach_sqlite(overwrite: bool) have changed, now it is on_exists: Literal["ignore", "replace", "error"] = "error"

This is prep for work I want to do to add attach_postgres(), but I want to get the API up to date so that the two APIs can be consistent. In fact, after this PR I don't even need another attach_postgres() method, since now the plain attach() is flexible enough to support this.

@NickCrews NickCrews added the breaking change Changes that introduce an API break at any level label May 9, 2025
@github-actions github-actions bot added tests Issues or PRs related to tests duckdb The DuckDB backend and removed breaking change Changes that introduce an API break at any level labels May 9, 2025
@NickCrews NickCrews added feature Features or general enhancements breaking change Changes that introduce an API break at any level sqlite The SQLite backend duckdb The DuckDB backend and removed tests Issues or PRs related to tests duckdb The DuckDB backend labels May 9, 2025
@NickCrews NickCrews force-pushed the duckdb-attach-sqlite branch from 9cc5741 to 80bf360 Compare May 9, 2025 02:19
@github-actions github-actions bot added the tests Issues or PRs related to tests label May 9, 2025
The name to attach the database as.
If `None`, use the default behavior of DuckDB
(as of duckdb==1.2.0 this is the basename of the path).
skip_if_exists
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very explicitly choosing this language of "skip_if_exists" instead of "exist_ok" because I don't want users to think this has "CREATE OR REPLACE" semantics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also do on_exists: Literal["ignore", "replace", "error"]="error" and then this would be more inline with how I want to change the create_table() and create_view() APIs to be in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with on_exists: Literal["ignore", "error"] = "error" so that it would be consistent with on_missing: Literal["ignore", "error"]="error" of detach()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if duckdb ever supports ATTACH OR REPLACE, then we are in a good position to switch from on_exists: Literal["ignore", "error"] to on_exists: Literal["ignore", "error", "replace"] and not break anyone and keep the same API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@NickCrews NickCrews Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but only in duckdb>=1.3, which we don't support yet because of a bug. So I only tested this if `duckdb.version >= "1.3.0" which is maybe a bit sloppy, maybe we should see if 1.3.1 has the fix and then bump our requirements, and then we can remove this version conditional?

@NickCrews NickCrews force-pushed the duckdb-attach-sqlite branch from 80bf360 to a1345c2 Compare June 6, 2025 14:27
@NickCrews NickCrews changed the title feat(duckdb): switch to newer ATTACH syntax for attach_sqlite() feat(duckdb): improve attach(), detach(), and attach_sqlite() Jun 6, 2025
cur.execute(
f"CALL sqlite_attach('{path}', overwrite={overwrite})"
).fetchall()
self._safe_raw_sql(f"SET GLOBAL sqlite_all_varchar={all_varchar}")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't THINK it's needed to use this as a context manager here (or in fact most places within the duckdb backend), but double check me.

@NickCrews NickCrews force-pushed the duckdb-attach-sqlite branch 2 times, most recently from 100149e to 0c1d64d Compare June 9, 2025 19:09
@NickCrews NickCrews force-pushed the duckdb-attach-sqlite branch 2 times, most recently from bd837f2 to 9151d46 Compare June 27, 2025 18:12
@NickCrews
Copy link
Contributor Author

The logic that detects the name of the added db name is moderately complicated. I THINK I think it's worth it, but I'd be open to removing it, and making the function always return None, and make the user specify a db name if they want access to that.

return final


for inp, expected in [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was me avoiding importing this private function into the test module. We could also just not test this at all, and call it an implementation detail/undefined behavior. Or I could put it in the test module. Or we could adjust the API of the .attach() method to not return the attached name and have duckdb name it whatever it wants opaquely to the user, which would allow us to just remove all this code.

NickCrews added 5 commits July 8, 2025 17:43
the sqlite_attach() approach is deprecated per
https://duckdb.org/docs/stable/extensions/sqlite.html,
and the newer syntax opens up read only flag, specifiying a name for the attachment, and not overwriting tables that already exist.
This is breaking in that some params are now keyword only.
@NickCrews NickCrews force-pushed the duckdb-attach-sqlite branch from 102bb4e to e08e744 Compare July 8, 2025 23:43
@NickCrews
Copy link
Contributor Author

@cpcloud would you want to hop on a zulip live chat to work through this? There are just a few semantics I can walk you through my thought process, might be easier live

@cpcloud
Copy link
Member

cpcloud commented Jul 28, 2025

@NickCrews Hit me up on Zulip when you're around!

("duckdb:///myddb.duckdb", "myddb"),
("https://example.com/myddb.duckdb", "myddb"),
]:
assert _attach_name(inp) == expected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in a test file. It's fine to import a private function in a test file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One reason is that we shouldn't need to run this code every time someone imports this module (indirectly or otherwise).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to avoid importing a private function was my reasoning here. I figured the execution cost (only runs during first import, and will be CPU bound w/ no IO needed) would be so minimal that it was just easier to keep it here. Are you sure you still sure you want it in a test file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Let's say this assertion breaks for some reason that is out of ibis's control (a bug in DuckDB perhaps) and that the bug doesn't affect most Ibis users. Except now it does because we've baked this assertion into the module import.

My desire to move this to a test doesn't stem from a concern about performance, but from usability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, that makes sense, thanks for the explanation

on_exists: Literal["error", "ignore", "replace"] = "error",
read_only: bool = False,
type: Literal["duckdb", "sqlite", "postgres", "mysql"] | None = None,
more_options: Iterable[str] = [],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use a mapping here? That seems like the most natural fit for a bag of options?

What purpose is more serving here in the API? It's at best redundant IMO (we know that the next argument is "more" arguments). Let's call this just options.

And this definitely cannot default to anything mutable like a list. The empty tuple is fine though.


sql = f"ATTACH {on_exists_string} '{path_or_url}' {as_name} ({option_string})"
databases_before = set(self.list_catalogs())
self.con.execute(sql).fetchall()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the fetchall call doing anything here?

return None
if on_exists == "replace":
return name
raise AssertionError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a code path that a user can hit. Why raise an AssertionError here instead of something more bespoke?

path = tmp_path / "test.db"
scon = sqlite3.connect(str(path))
try:
with scon:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use the context manager here you don't need to explicitly call .close().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Changes that introduce an API break at any level duckdb The DuckDB backend feature Features or general enhancements sqlite The SQLite backend tests Issues or PRs related to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants