Skip to content

Ambiguous spec expressions #14

@jpn--

Description

@jpn--

In the joint tour frequency and composition component, we have (for example):

util_constant_for_children_party_shopping_tour,Constant for Children Party/ Shopping Tour,@(df.purpose1==5)*(df.party1==2)+(df.purpose2==5)*(df.party2==2),coef_constant_for_children_party_shopping_tour

This expression is summarized as (bool * bool) + (bool * bool). The two parenthetical terms each neatly and correctly resolves to a binary value regardless of whether the operands are treated as literal boolean values or their (0,1) numerical equivalent. However, + operator is not so clean; if both operands are True, we could arrive at different results:

  1. Interpret as numeric, so1 + 1 = 2, or
  2. Interpret as boolean, so True + True = True.

The numexpr engine of pandas.eval will (with arguably good reason) punt on solving this, throwing a NotImplementedError. Pandas can fall back to numpy logic, which will solve the expression based on the logic (2) and get True. Sharrow converts the booleans to numbers, using logic (1).

It would be better to write expressions so they are less ambiguous, and (obviously) so they resolve the same with or without sharrow. Based on context clues from the rest of the spec, it appears the intention of these expressions is following logic (1). @dhensle can you (or whomever at RSG crafted this spec) confirm the preferred interpretation?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions