Skip to content

Conversation

@cpsievert
Copy link
Contributor

@cpsievert cpsievert commented Dec 1, 2025

Migrate R Data Source API to R6 Classes

Overview

This PR migrates the R package's data source API from an S3-based approach to an R6 class-based approach, aligning it with the Python package's design. This is a (developer facing) breaking change that provides a cleaner, more maintainable API.

Changes

New R6 Class Hierarchy

Base Class: DataSource

  • Abstract R6 class defining the interface for all data sources
  • Methods:
    • get_db_type() - Returns database type string
    • get_schema(categorical_threshold = 20) - Returns schema information
    • execute_query(query) - Executes SQL and returns data frame
    • test_query(query) - Tests query by fetching one row
    • get_data() - Returns all unfiltered data
    • cleanup() - Cleans up resources
  • Property:
    • table_name - Name of the table

Concrete Classes:

  • DataFrameSource - Wraps data frames using DuckDB
  • DBISource - Wraps DBI database connections

Breaking Changes

Old API (Removed)

# S3 generic functions
source <- as_querychat_data_source(mtcars, "mtcars")
result <- execute_query(source, "SELECT * FROM mtcars WHERE mpg > 25")
schema <- get_schema(source)
db_type <- get_db_type(source)
cleanup_source(source)

New API

# Direct R6 instantiation
source <- DataFrameSource$new(mtcars, "mtcars")
result <- source$execute_query("SELECT * FROM mtcars WHERE mpg > 25")
schema <- source$get_schema()
db_type <- source$get_db_type()
source$cleanup()

# Or with DBI connection
source <- DBISource$new(conn, "table_name")

Internal Changes

  1. data_source.R - Complete rewrite with R6 classes

    • Removed all S3 generics
    • Added DataSource, DataFrameSource, DBISource classes
    • Helper functions remain: get_schema_impl(), r_class_to_sql_type()
    • Renamed create_system_prompt() to get_system_prompt() (internal)
  2. QueryChat.R

    • Updated normalize_data_source() to auto-convert to R6 classes
    • Updated cleanup() to call data_source$cleanup() method
    • $data_source is now an active binding (field), not a method
  3. querychat_tools.R

    • Updated tool implementations to use $ method syntax
    • Changed from get_db_type(source) to source$get_db_type()
    • Changed from execute_query(source, query) to source$execute_query(query)

Benefits

  1. Consistency: R and Python packages now share the same design pattern
  2. Better Encapsulation: Private connection objects are truly private via private list
  3. Easier Extensibility: Users can inherit from DataSource for custom implementations instead of defining a handful of (poorly defined) S3 methods.

Example: Extending DataSource

Users can now easily create custom data sources:

CustomSource <- R6::R6Class(
  "CustomSource",
  inherit = DataSource,
  public = list(
    initialize = function(data, table_name) {
      self$table_name <- table_name
      # Custom initialization
    },

    get_db_type = function() {
      "CustomDB"
    },

    get_schema = function(categorical_threshold = 20) {
      # Custom schema logic
    },

    execute_query = function(query) {
      # Custom query execution
    },

    # ... implement other required methods
  )
)

This comment was marked as resolved.

@cpsievert cpsievert marked this pull request as ready for review December 2, 2025 18:06
@cpsievert cpsievert force-pushed the feat/r-data-source-api branch from 14ee321 to f40c505 Compare December 3, 2025 16:42
@cpsievert cpsievert force-pushed the feat/r-data-source-api branch from f40c505 to 8350cbe Compare December 3, 2025 16:44
Co-authored-by: Garrick Aden-Buie <[email protected]>
Copy link
Contributor

@gadenbuie gadenbuie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like this approach, it's nice, clean and easy to follow!

@cpsievert cpsievert merged commit 1d87120 into main Dec 3, 2025
16 checks passed
@cpsievert cpsievert deleted the feat/r-data-source-api branch December 3, 2025 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants