Skip to content
This repository was archived by the owner on Jan 26, 2026. It is now read-only.

FileDataSource.get_dataframe(args_dict) should also provide simple equality-based selection #96

@schuderer

Description

@schuderer

Description

Right now, DBMS-style DataSources allow for querying using parameters provided through args_dict. The most common case is probably providing a value to check equality with (e.g. an ID for an item to fetch).

Although DataSources claim to be isomorphic towards model code, in the case of the FileDataSource, one would have to write specific code to select the desired record from the loaded CSV.

To make this claim somewhat more true (and usage more consistent between kinds of DataSources, at least for the equality case), I propose to add functionality to FileDataSource.get_dataframe to use the args_dict parameter for selection. args_dict would be a dictionary of column key(s) with values to equality-test. get_dataframe would return a subset of the originally loaded DataFrame.

Other comments

If there is a clean, reasonably fast, pandas-supported way to do this on CSVs without loading them into memory first, this would be preferrable to first loading all data, then filtering. Maybe this is relevant: https://stackoverflow.com/questions/13651117/how-can-i-filter-lines-on-load-in-pandas-read-csv-function

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions