-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
Is your feature request related to a problem?
I don't understand the context. If by " problem" you mean "a bug", then no. If you mean "a problem that i currently have", then see the Addtional Context part.
Describe the solution you'd like
I would like to be able to pass a factory-function or a matching instance in either of the engine
and parser
arguments in pd.eval
, and pd.DataFrame.query
. For example, to be able to run:
df.query(..., parser=MyParser)
Where MyParser
is a custom parser type (can also be a factory function) that accepts the same parameters as any BaseExprVisitor
derived class (i.e. env, engine, parser, preparser
).
API breaking implications
- The
Expr.parser
andExpr.engine
should be instantiated outside theExpr
class' initialization. Note that this settles better with the current (newer) multi-expression implementation of the function. - Maybe consider moving
pandas.core.computation.expr._parsers
andpandas.core.computation.engines._engine
topandas.core.computation.eval
or similar, and instantiate them inpd.core.computation.eval.eval
. - Also consider "breaking"
pd.core.computation.eval.eval
into the regularpd.core.computation.eval.eval
and a newerpd.core.computation.eval.eval_single_expression
that evaluates a singleExpr
class instance (basically line 353 and below) to allow a more customizable evaluation behavior.
From what I see these changes shouldn't be a big deal at all, but I'm no expert.
Describe alternatives you've considered
As an alternative, what I currently do is
from pandas.core.computation.expr import PARSERS, PandasExprVisitor
class MyParser(PandasExprVisitor): pass
PARSERS['my_parser'] = MyParser
which is, of course, hacky and undocumented.
Additional context
I have a dataclass that contains pd.Series
and I would like to implement pd.query
like the dataframe does. The idea I came up with is as follows:
Suppose I get the call:
my_cls_instance.query('((a * 2) == 1) & (b == 2)`)
where a
and b
are series contained in my class.
I figure that in order to evaluate this using pandas (with minimal interruptions as possible), I need to
- separate the "unary" expressions (here they are
((a * 2) == 1)
and(b == 2)
) from the "binary" expressions (here - the&
), - evaluate each "unary" expression individually using
pd.eval
or similar, - perform one of the following (I'm unsure of the best course of action):
3.1. replace the evaluated results in the strings (e.g.'__processed_1__ & __processed_2__'
) and rerunpd.eval
again; or
3.2. to change the contents of theExpr
and evaluate.
So looking at the code, I found that what I want is possible only if I can use my own parser, and/or engine.