Skip to content

feat: lark parser#28

Open
camriddell wants to merge 10 commits intowilliambdean:mainfrom
camriddell:feat/lark-parser
Open

feat: lark parser#28
camriddell wants to merge 10 commits intowilliambdean:mainfrom
camriddell:feat/lark-parser

Conversation

@camriddell
Copy link
Contributor

@camriddell camriddell commented Aug 13, 2025

Uses lark to construct a grammar and parser for the query strings.

  • Adds a dependency on lark
  • Adds a grammar to parse the current query features
  • defines entry points in search.py
    • def parse → to return a graph composed of parse.BinaryOps and SearchNodes
    • def search → convenience to parse a query, convert the result to a narwhals.Expr and use it to filter passed in data

To Do

  • Add proper exceptions
  • Write tests that are actually tests
    • thoroughly test grammar features
  • add comments to grammar file?
    • create human readable version of grammar for documentation
  • implement fallback comparison if no terms are found in a given query string (this is not supported at the grammar level as of now)

Copy link
Owner

@williambdean williambdean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments on how the previous functionality worked

VALUE_DEPENDENT_COMPARATORS = defaultdict(
lambda: operator.eq,
{
str: lambda col, value: col.str.contains(value),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this using to_lowercase and lower on the string value just to make it easier while using

Column

["Bob", "Bobby", "Billy Bob"]
search("name:bob") # Would work for all

Comment on lines +130 to +131
def BARE_KEY(self, token: Token) -> nw.Expr:
return Column(token.value)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous behavior treats these not as Column but as a value to match in a default column. For example, having a default column of state allows users to just use search("California") for matches. Those default and columns aliases might be needed in an init of this Transformer

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be wrong spot to point this out though

@williambdean williambdean added the enhancement New feature or request label Aug 13, 2025
The new grammar introduces standard comparators and
allows the colon comparator to use value-specific comparisons.
The rules are as follows:
- The left side of any comparator is always parsed as a column name
- The right side of any comparator can be a value, or composition of
  values
- numeric comparisons can be performed via standard comparators <>=!
- the ":" comparator is value-depdendent
        - "x,y,z" (isin) supports checking multiple values
        - 2..6 (range) supports numeric/date range comparisons
        - "abe" (escaped strings) are lowercase matched
@camriddell
Copy link
Contributor Author

Made some updates to flesh out the grammar a bit more. There are two major features that still need to be integrated

  1. default_column: allow queries that consist of a single value to be evaluated against a default column.
  2. column_mapping: allow modification of column names to make querying via a string more convenient.

Let me know if there is anything else as well that you can find/think of!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants