-
Notifications
You must be signed in to change notification settings - Fork 663
feat(duckdb): improve attach(), detach(), and attach_sqlite() #11198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
9cc5741
to
80bf360
Compare
ibis/backends/duckdb/__init__.py
Outdated
The name to attach the database as. | ||
If `None`, use the default behavior of DuckDB | ||
(as of duckdb==1.2.0 this is the basename of the path). | ||
skip_if_exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very explicitly choosing this language of "skip_if_exists" instead of "exist_ok" because I don't want users to think this has "CREATE OR REPLACE" semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also do on_exists: Literal["ignore", "replace", "error"]="error" and then this would be more inline with how I want to change the create_table() and create_view() APIs to be in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with on_exists: Literal["ignore", "error"] = "error"
so that it would be consistent with on_missing: Literal["ignore", "error"]="error"
of detach()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if duckdb ever supports ATTACH OR REPLACE, then we are in a good position to switch from on_exists: Literal["ignore", "error"]
to on_exists: Literal["ignore", "error", "replace"]
and not break anyone and keep the same API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but only in duckdb>=1.3, which we don't support yet because of a bug. So I only tested this if `duckdb.version >= "1.3.0" which is maybe a bit sloppy, maybe we should see if 1.3.1 has the fix and then bump our requirements, and then we can remove this version conditional?
80bf360
to
a1345c2
Compare
ibis/backends/duckdb/__init__.py
Outdated
cur.execute( | ||
f"CALL sqlite_attach('{path}', overwrite={overwrite})" | ||
).fetchall() | ||
self._safe_raw_sql(f"SET GLOBAL sqlite_all_varchar={all_varchar}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't THINK it's needed to use this as a context manager here (or in fact most places within the duckdb backend), but double check me.
100149e
to
0c1d64d
Compare
bd837f2
to
9151d46
Compare
The logic that detects the name of the added db name is moderately complicated. I THINK I think it's worth it, but I'd be open to removing it, and making the function always return None, and make the user specify a db name if they want access to that. |
return final | ||
|
||
|
||
for inp, expected in [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was me avoiding importing this private function into the test module. We could also just not test this at all, and call it an implementation detail/undefined behavior. Or I could put it in the test module. Or we could adjust the API of the .attach() method to not return the attached name and have duckdb name it whatever it wants opaquely to the user, which would allow us to just remove all this code.
the sqlite_attach() approach is deprecated per https://duckdb.org/docs/stable/extensions/sqlite.html, and the newer syntax opens up read only flag, specifiying a name for the attachment, and not overwriting tables that already exist.
This is breaking in that some params are now keyword only.
102bb4e
to
e08e744
Compare
@cpcloud would you want to hop on a zulip live chat to work through this? There are just a few semantics I can walk you through my thought process, might be easier live |
@NickCrews Hit me up on Zulip when you're around! |
("duckdb:///myddb.duckdb", "myddb"), | ||
("https://example.com/myddb.duckdb", "myddb"), | ||
]: | ||
assert _attach_name(inp) == expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in a test file. It's fine to import a private function in a test file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One reason is that we shouldn't need to run this code every time someone imports this module (indirectly or otherwise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to avoid importing a private function was my reasoning here. I figured the execution cost (only runs during first import, and will be CPU bound w/ no IO needed) would be so minimal that it was just easier to keep it here. Are you sure you still sure you want it in a test file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
Let's say this assertion breaks for some reason that is out of ibis's control (a bug in DuckDB perhaps) and that the bug doesn't affect most Ibis users. Except now it does because we've baked this assertion into the module import.
My desire to move this to a test doesn't stem from a concern about performance, but from usability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, that makes sense, thanks for the explanation
on_exists: Literal["error", "ignore", "replace"] = "error", | ||
read_only: bool = False, | ||
type: Literal["duckdb", "sqlite", "postgres", "mysql"] | None = None, | ||
more_options: Iterable[str] = [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use a mapping here? That seems like the most natural fit for a bag of options?
What purpose is more
serving here in the API? It's at best redundant IMO (we know that the next argument is "more" arguments). Let's call this just options
.
And this definitely cannot default to anything mutable like a list
. The empty tuple
is fine though.
|
||
sql = f"ATTACH {on_exists_string} '{path_or_url}' {as_name} ({option_string})" | ||
databases_before = set(self.list_catalogs()) | ||
self.con.execute(sql).fetchall() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the fetchall
call doing anything here?
return None | ||
if on_exists == "replace": | ||
return name | ||
raise AssertionError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a code path that a user can hit. Why raise an AssertionError
here instead of something more bespoke?
path = tmp_path / "test.db" | ||
scon = sqlite3.connect(str(path)) | ||
try: | ||
with scon: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use the context manager here you don't need to explicitly call .close()
.
The sqlite_attach() approach is deprecated per
https://duckdb.org/docs/stable/extensions/sqlite.html,
and the newer syntax opens up read only flag, specifiying a name for the attachment, not overwriting tables that already exist.
This is breaking:
on_exists: Literal["ignore", "replace", "error"] = "error"
This is prep for work I want to do to add attach_postgres(), but I want to get the API up to date so that the two APIs can be consistent. In fact, after this PR I don't even need another attach_postgres() method, since now the plain attach() is flexible enough to support this.