-
Notifications
You must be signed in to change notification settings - Fork 24
feat(r+py): Generic datasources #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...instead of requiring explicit DataSource subclass creation
…provements Plus some improvements: - Cleaner .md file reading code in example apps - Use GPT-4.1 by default, not GPT-4 😬 - Make sqlalchemy required
…-datasource-improvements
fix: No longer need to manually calls session$ns() with shinychat (#1…
|
Alright, I think this is ready to merge @jcheng5 |
Previously, the examples/app-database.R would shown an error on startup because the initial query was "", which was then sent as a SQL query to RSQLite. The get_lazy_data code path accounted for the "" query, so we decided to make the eager code path just call the lazy code path, then collect(). Also fixed a formatting issue with the table.
It seems like dbplyr tables-as-queries can be a bit... temperamental. This should fix that by explicitly declaring sql always.
|
Shooot, there is one problem left. We're now using I'll ask Hadley what the right way to do this is (if there is one). |
|
After thinking about this further, I don't think having Besides the challenge above of @npelikan I know you're off this week, and I'm off next week, but if you agree with my assessment and have Daniel review your code, you guys can merge in my absence. |
|
@jcheng5 I generally agree -- but my takeaway is a slightly different approach. In general, my goal with returning a We've already had great luck with this with Querychat Python+Ibis on a VERY large dataset (example). I've had some time to dig into the
|
|
Another potential solution to Item 2 could be to implement the defensive programming ourselves, and have it feed sensible errors back to the LLM tools when necessary. I did some reprex-ing in preparation for an issue submission and found that the issue seems to be only around trailing comments on single-line queries and trailing semicolons. It seems that we could solve this with a few simple library(dplyr)
library(RSQLite)
library(dplyr)
library(DBI)
test_df <- data.frame(
id = 1:5,
value = c(10, 20, 30, 40, 50),
stringsAsFactors = FALSE
)
temp_db <- tempfile(fileext = ".db")
conn <- dbConnect(RSQLite::SQLite(), temp_db)
dbWriteTable(conn, "test_table", test_df, overwrite = TRUE)
# Test with a simple select statement. Works.
simple_select <- tbl(conn, sql("SELECT * FROM test_table WHERE value > 20"))
# Multiline select statement. Works.
multiline_select <- sql("
SELECT *
FROM test_table
WHERE value > 20
")
multiline_select_tbl <- tbl(conn, multiline_select)
# Multiline select statement with trailing inline comment. Works.
multiline_comment <- sql("
SELECT *
FROM test_table
WHERE value > 20 -- this is a filter")
multiline_comment_tbl <- tbl(conn, multiline_comment)
# Trailing semicolon statement. Fails.
trailing_semicolon <- sql("
SELECT *
FROM test_table
WHERE value > 20;")
trailing_semicolon_tbl <- tbl(conn, trailing_semicolon)
# Error in `db_query_fields.DBIConnection()`:
# ! Can't query fields.
# ℹ Using SQL: SELECT * FROM (
# SELECT * FROM test_table WHERE value > 20 ; ) AS `q01` WHERE (0 =
# 1)
# Caused by error:
# ! near ";": syntax error
# single-line select with trailing inline comment
singleline_comment <- sql("SELECT * from test_table WHERE value > 20 --this is a filter")
singleline_comment_tbl <- tbl(conn, singleline_comment)
# Error in `db_query_fields.DBIConnection()`:
# ! Can't query fields.
# ℹ Using SQL: SELECT * FROM (SELECT * from test_table WHERE value >
# 20 --this is a filter) AS `q02` WHERE (0 = 1)
# Caused by error:
# ! near "WHERE": syntax error
# single-line select statement with midline inline comment. Expected to fail, fails
singleline_midline_comment <- sql("SELECT * from test_table -- breakme WHERE value > 20")
singleline_midline_comment_tbl <- tbl(conn, singleline_midline_comment)
# Error in `db_query_fields.DBIConnection()`:
# ! Can't query fields.
# ℹ Using SQL: SELECT * FROM (SELECT * from test_table -- breakme
# WHERE value > 20) AS `q03` WHERE (0 = 1)
# Caused by error:
# ! incomplete input
|
|
I think returning a CREATE OR REPLACE VIEW querychat_results_{table} AS
-- LLM-generated SQL statement
SELECT * FROM {table} WHERE llm_conditions;that you execute with |
|
@gadenbuie great idea! The one issue I see with that though is that the database user under which querychat is accessing the data would need |
This is a WIP but seems to basically work.
One question -- I retained the basic functionality for local data.frames (that is, df -> querychat -> df), where remote data sources instead return a dbplyr lazy
tbl(), meant for chaining. Is this too confusing of a behavior split? Should local data.frames also return atbl(), just now connected to duckdb?A few immediate TODOs: