-
Notifications
You must be signed in to change notification settings - Fork 167
feature: add duckdb support #365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feature: add duckdb support #365
Conversation
gregfurman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow thanks for the PR! Have left some comments.
I was giving the original thread and linked issues a read. It appears that duckdb v1.3.1 will be fixing this. See marcboeker/go-duckdb#454 (comment)
Should we perhaps just wait until this has been fixed and see if that'll solve the CGO dependency? Would be great to have this in the pure Go bento build 😄
internal/impl/sql/conn_fields.go
Outdated
| } | ||
|
|
||
| func sqlOpenWithReworks(ctx context.Context, logger *service.Logger, driver, dsn string, connSettings *connSettings) (*sql.DB, error) { | ||
| if !slices.Contains(sql.Drivers(), driver) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this approach! I think, however, that we should leverage build tags to dynamically set how we should be validating this. For example, see the driverField variable above at
bento/internal/impl/sql/conn_fields.go
Lines 17 to 18 in 9fc0fcd
| var driverField = service.NewStringEnumField("driver", "mysql", "postgres", "clickhouse", "mssql", "sqlite", "oracle", "snowflake", "trino", "gocosmos", "spanner", "duckdb"). | |
| Description("A database [driver](#drivers) to use.") |
What if instead we added a LintRule that (by default) returned this error message you've added below except, we only set this when the x_bento_extra tag has NOT been used.
So we add a conn_fields_pure.go where we overwrite this driverField to raise a lint error when not in CGO/bento_extra:
//go:build !x_bento_extra
package sql
func init() {
driverField = service.NewStringEnumField("driver", "mysql", "postgres", "clickhouse", "mssql", "sqlite", "oracle", "snowflake", "trino", "gocosmos", "spanner").
Description("A database [driver](#drivers) to use.").LintRule(`root = if this == "duckdb" { [ "Cannot use DuckDB driver outside of CGO build. See ..." ] }`)
}
Due to C++ deps, CGO builds are needed for duckdb. But there are a few strategies to reduce overall image and/or binary size. |
|
Can you run a |
| func init() { | ||
| driverField = service.NewStringEnumField("driver", "mysql", "postgres", "clickhouse", "mssql", "sqlite", "oracle", "snowflake", "trino", "gocosmos", "spanner"). | ||
| Description("A database [driver](#drivers) to use.").LintRule(`root = if this == "duckdb" { [ "Cannot use DuckDB driver outside of CGO build. Use a binary built with tag x_bento_extra, or the CGO Docker image" ] }`) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested this locally and all looks good!
| @@ -1,36 +1,72 @@ | |||
| FROM golang:1.23 AS build | |||
| FROM ubuntu:24.04 AS build | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not stay on an official golang image when building?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be able to now with the duckdb update, I had a PR there that downgraded the C++ ABI for better compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK sick. Have these upstream changes been merged? Or should we address in a follow up
|
|
||
| # Pack | ||
| FROM debian:latest | ||
| FROM ubuntu:24.04 AS final |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar question as above: why the move over to ubuntu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed a newer C++ runtime environment for the duckdb dependencies since it's not a fully static build
| # Copy the full source and build | ||
| COPY . ./ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wont this also copy the entire website/, .git/, resources/ etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, has the same behavior as before but uses the relative path.
|
|
||
| # Install minimal runtime dependencies | ||
| RUN apt-get update && apt-get install -y --no-install-recommends \ | ||
| libzmq5 ca-certificates \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why install ca-certificates both in builder and again in final?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a holdover from trying to use slimmer images, can just copy it over again
James-Gilbert-
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, is the goal of the build process to produce a fully static binary? Or is it to create an image with the extra CGO runtime dependencies?
| # Copy the full source and build | ||
| COPY . ./ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, has the same behavior as before but uses the relative path.
|
|
||
| # Pack | ||
| FROM debian:latest | ||
| FROM ubuntu:24.04 AS final |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed a newer C++ runtime environment for the duckdb dependencies since it's not a fully static build
|
|
||
| # Install minimal runtime dependencies | ||
| RUN apt-get update && apt-get install -y --no-install-recommends \ | ||
| libzmq5 ca-certificates \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a holdover from trying to use slimmer images, can just copy it over again
| @@ -1,36 +1,72 @@ | |||
| FROM golang:1.23 AS build | |||
| FROM ubuntu:24.04 AS build | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be able to now with the duckdb update, I had a PR there that downgraded the C++ ABI for better compatibility.
@James-Gilbert- I suppose that since we don't release the CGO/x_bento_extra builds as a static binary, the goal would be to create an image with runtime dependencies baked in.
Curious, would something like this be possible? Perhaps using |
|
Also, lmk if you need me to re-build the new SQL documentation since it's failing in your CI 😄 |
I think it'd be possible to statically-cross compile, but with significant effort (probably some upstream effort w/DuckDB). Having a final layer with runtime dependencies is much less of a change. |
Gotcha! Once you run a make docs and commit the generated files, it should unblock the CI 🟢 Otherwise, lmk re: your capacity to address the remainder of image comments. I'm happy to also follow up if you'd prefer 🙂 |
|
It seems that because of the build tag |
|
would really love to have this! lmk if i can help at all |
Changes
x_bento_extratagTesting
go-duckdbdocs, JSON + Parquet support should be included in the binaries.Notes
Due to some ABI compatibility issues with the pre-built static binaries, a newer base image and changes to the Dockerfile.cgo were needed. It is possible to build each of the dependencies from source (either dynamically or statically).
From experiments, it's possible to get a statically linked binary down to ~200MB building from source, with build time and licensing requirements being the primary trade-off.