Skip to content

Detect and translate ADQL that is only gathering statistics #20

@gitosaurus

Description

@gitosaurus

Feature request

The ADQL:

SELECT MIN(ra) AS ra_min, MAX(ra) AS ra_max, MIN(dec) AS dec_min, MAX(dec) AS dec_max FROM ppdb.DiaObject"

can be translated to LSDB as:

ppdb.aggregate_column_statistics(include_columns=["ra","dec"])[["min_value", "max_value"]]

There's a lot of pattern matching to do here, since MIN and MAX are simply ADQL functions and might be mixed with other arithmetic for other purposes. But this particular purpose, to gather statistics on a subset of columns, is common enough to look for it specifically. The translator will need to determine:

  1. That the statistical functions are all that are in the SELECT clause
  2. That the functions are all available in .aggregate_column_statistics
  3. The underlying original column names

Before submitting
Please check the following:

  • I have described the purpose of the suggested change, specifying what I need the enhancement to accomplish, i.e. what problem it solves.
  • I have included any relevant links, screenshots, environment information, and data relevant to implementing the requested feature, as well as pseudocode for how I want to access the new functionality.
  • If I have ideas for how the new feature could be implemented, I have provided explanations and/or pseudocode and/or task lists for the steps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions