-
Notifications
You must be signed in to change notification settings - Fork 8
Description
In the joint tour frequency and composition component, we have (for example):
| util_constant_for_children_party_shopping_tour,Constant for Children Party/ Shopping Tour,@(df.purpose1==5)*(df.party1==2)+(df.purpose2==5)*(df.party2==2),coef_constant_for_children_party_shopping_tour |
This expression is summarized as (bool * bool) + (bool * bool). The two parenthetical terms each neatly and correctly resolves to a binary value regardless of whether the operands are treated as literal boolean values or their (0,1) numerical equivalent. However, + operator is not so clean; if both operands are True, we could arrive at different results:
- Interpret as numeric, so
1 + 1 = 2, or - Interpret as boolean, so
True + True = True.
The numexpr engine of pandas.eval will (with arguably good reason) punt on solving this, throwing a NotImplementedError. Pandas can fall back to numpy logic, which will solve the expression based on the logic (2) and get True. Sharrow converts the booleans to numbers, using logic (1).
It would be better to write expressions so they are less ambiguous, and (obviously) so they resolve the same with or without sharrow. Based on context clues from the rest of the spec, it appears the intention of these expressions is following logic (1). @dhensle can you (or whomever at RSG crafted this spec) confirm the preferred interpretation?