Skip to content

PrettyPFA Simple Example

bwengals edited this page Jul 13, 2015 · 6 revisions

Here is a scoring engine that applies the quadratic formula to input:

>>> pfa = prettypfa.json('''
input: record(a: double, b: double, c: double)
output: union(null,
              record(Output,
                     solution1: double,
                     solution2: double))
action:
  var a = input.a, b = input.b, c = input.c;

  var discriminant = b**2 - 4*a*c;
  if (discriminant >= 0.0) {
    // if there are any real solutions, return them
    var x1 = -b + m.sqrt(discriminant)/(2*a);
    var x2 = -b - m.sqrt(discriminant)/(2*a);
    new(Output, solution1: x1, solution2: x2)
  }
  else
    // otherwise, return null (N/A)        null
''')

Note that binary operators such as + and - are expressed with familiar infix notation (between the things they add or subtract) and the if statement looks like something from a C program.

The corresponding PFA is harder to read (especially if it isn't pretty-printed):

>>> print pfa
{"@": "PrettyPFA document", "name": "Engine_1", "input": {"fields": [{"type": "double", "name": "a"},     {"type": "double", "name": "b"}, {"type": "double", "name": "c"}], "type": "record", "name": "Record_63"},    "output": ["null", {"fields": [{"type": "double", "name": "solution1"}, {"type": "double", "name":            "solution2"}], "type": "record", "name": "Output"}], "method": "map", "action": [{"@": "PrettyPFA line 8",    "let": {"a": {"@": "PrettyPFA line 8", "attr": "input", "path": [{"@": "PrettyPFA line 8", "string": "a"}]},  "c": {"@": "PrettyPFA line 8", "attr": "input", "path": [{"@": "PrettyPFA line 8", "string": "c"}]}, "b": {"@ ": "PrettyPFA line 8", "attr": "input", "path": [{"@": "PrettyPFA line 8", "string": "b"}]}}}, {"@":          "PrettyPFA line 10", "let": {"discriminant": {"@": "PrettyPFA line 10", "-": [{"@": "PrettyPFA line 10", "**  ": ["b", 2]}, {"@": "PrettyPFA line 10", "*": [{"@": "PrettyPFA line 10", "*": [4, "a"]}, "c"]}]}}}, {"@":    "PrettyPFA lines 11-19", "if": {"@": "PrettyPFA line 11", ">=": ["discriminant", 0.0]}, "then": [{"@":        "PrettyPFA line 13", "let": {"x1": {"@": "PrettyPFA line 13", "+": [{"@": "PrettyPFA line 13", "u-": ["b"]},  {"@": "PrettyPFA line 13", "/": [{"@": "PrettyPFA line 13", "m.sqrt": ["discriminant"]}, {"@": "PrettyPFA     line 13", "*": [2, "a"]}]}]}}}, {"@": "PrettyPFA line 14", "let": {"x2": {"@": "PrettyPFA line 14", "-": [{"@ ": "PrettyPFA line 14", "u-": ["b"]}, {"@": "PrettyPFA line 14", "/": [{"@": "PrettyPFA line 14", "m.sqrt":   ["discriminant"]}, {"@": "PrettyPFA line 14", "*": [2, "a"]}]}]}}}, {"@": "PrettyPFA line 15", "type":        "Output", "new": {"solution2": "x2", "solution1": "x1"}}], "else": [null]}]}

The "@": "PrettyPFA line ..." key-value pairs pass the source file line numbers to later stages of processing, so that error messages can point back to lines in the original source file.

This PFA document is ready to be used as a scoring engine. You can test it in the usual way:

>>> import titus.genpy
>>> engine, = titus.genpy.PFAEngine.fromJson(pfa)
>>> print engine.action({"a": 1, "b": 8, "c": 4})
{'solution1': -4.535898384862246, 'solution2': -11.464101615137753}
>>> print engine.action({"a": 1, "b": 2, "c": 3})
None

Functions in Titus for generating PrettyPFA

The titus.prettypfa module has functions for generating PFA at any stage of its life cycle.

  • json(ppfa, lineNumbers=True, check=True): construct JSON text from PrettyPFA text ppfa. If lineNumbers is False, don't include line numbers. If check is False, don't verify that the resulting PFA is syntactically and semantically valid. This can be useful for generating invalid PFA that is later made valid by inserting a subtree.
  • jsonNode(ppfa, lineNumbers=True, check=True): construct a Python dictionary of Python lists, strings, and numbers from PrettyPFA text ppfa. This form can be immediately modified by a Python algorithm.
  • ast(ppfa, check=True): construct an abstract syntax tree from PrettyPFA text ppfa. This tree is a suite of nested specialized class instances for each PFA form. It can be used for modifications of the PFA that require more knowledge of the tree's structure.
  • engine(ppfa, options=None, sharedState=None, multiplicity=1, style="pure", debug=False): construct a list of executable scoring engines from PrettyPFA text ppfa. The other options are the same as titus. genpy.PFAEngine.fromJson.

Also, you may have noticed that the input and output type specifications in PrettyPFA are not Avro schema, unlike PFA. The intention is to make them easier to write. However, some datasets already have conventional Avro schema, so Titus has a function to convert Avro schema to PrettyPFA snippets.

  • avscToPretty(avsc, indent=0): construct a PrettyPFA text snippet from avsc, which is a Python dictionary of Python lists, strings, and numbers, representing an Avro schema. The indent is the starting indentation level.

If your Avro schema is stored in a file, you can use:

>>> import json
>>> avscToPretty(json.load(open(fileName)))

More realistic example

The simple example above applies a mathematical formula to user-supplied parameters, but PFA was designed for data processing. The example in this section is more realistic.

To begin, download the Exoplanets dataset, which we use for examples. It can be read with Avro or fastavro.

>>> from avro.datafile import DataFileReader
>>> from avro.io import DatumReader
>>> exoplanetsIter = DataFileReader(open("exoplanets.avro"), DatumReader())
>>> exoplanets = list(exoplanetsIter)
>>> print len(exoplanets)
1103

or

>>> import fastavro
>>> exoplanetsIter = fastavro.reader(open("exoplanets.avro"))
>>> exoplanets = list(exoplanetsIter)
>>> print len(exoplanets)
1103

Next, build a scoring engine directly from PrettyPFA. This engine finds the coldest planet orbiting each star in the exoplanets dataset, with contingencies for missing data.

>>> engine, = prettypfa.engine('''
types:
  Planet = record(
    name:          string,                // Name of the planet
    detection:                            // Discovery technique
      enum([astrometry, imaging, microlensing, pulsar,
            radial_velocity, transit, ttv, OTHER]),
    discovered:    string,                // Year of discovery
    updated:       string,                // Date of last update
    mass:          union(double, null),   // Mass over Jupiter's mass
    radius:        union(double, null),   // Radius over Jupiter's
    period:        union(double, null),   // Planet year (Earth days)
    max_distance:  union(double, null),   // Distance from star (AU)
    eccentricity:  union(double, null),   // (0 = circle, 1 = escapes)
    temperature:   union(double, null),   // Temperature (Kelvin)
    temp_measured: union(boolean, null),  // True if the measured
    molecules:     array(string)          // Molecules observed
  );

  Star = record(
    name:    string,                      // Name of the star
    ra:      union(double, null),         // Right ascension (degrees)
    dec:     union(double, null),         // Declination (degrees)
    mag:     union(double, null),         // Magnitude (unitless)
    dist:    union(double, null),         // Distance away (parsecs)
    mass:    union(double, null),         // Mass over Sun's mass
    radius:  union(double, null),         // Radius over Sun's radius
    age:     union(double, null),         // Age (billions of years)
    temp:    union(double, null),         // Temperature (Kelvin)
    type:    union(string, null),         // Spectral type
    planets: array(Planet)                // Orbiting planets
  );

  PlanetWithTemp = record(planet: Planet, temp: double)

input: Star
output: Planet

method: emit
action:
  var star = input;  // name the input for convenience

  // build up a list of planets with temperature estimates
  var pt = json(array(PlanetWithTemp), []);
  foreach (planet: star.planets, seq: true) {
    var temp =
      ifnotnull(t: planet.temperature)
        // if a planet's temperature is already defined, use it
        t
      else {
        // otherwise, estimate it from the star
        ifnotnull(t: star.temp,
                  r: star.radius,
                  d: planet.max_distance) {
          var r_in_km = r * 695800.0;
          var d_in_km = d * 149600000.0;
          t / (d/r)**2
        }
        else
          // third case: not enough data to make any estimate
          null
      };
    // if the above resulted in an estimate, add it to the list
    ifnotnull(t: temp) {
      pt = a.append(pt, new(PlanetWithTemp,
                            planet: planet,
                            temp: t))
    }
  };

  // if the list is not empty...
  if (a.len(pt) > 0) {
    // find the coldest planet
    var coldest =
      a.minLT(pt, fcn(x: PlanetWithTemp,
                      y: PlanetWithTemp -> boolean) {
        x.temp < y.temp
      });

    // and emit it as the result of this scoring engine
    emit(coldest.planet)
  }
''')

Now run the scoring engine on the exoplanets dataset.

>>> def emit(x):
...     print x
...
>>> engine.emit = emit
>>>
>>> for star in exoplanets:
...     engine.action(star)
...
{u'discovered': '2014', u'updated': '2014-03-06', u'name': 'Kepler-207 d', u'temp_measured': True,        u'period': 5.868075, u'detection': u'transit', u'eccentricity': None, u'radius': 0.295, u'molecules': [],     u'max_distance': 0.068, u'mass': None, u'temperature': None}

{u'discovered': '2004', u'updated': '2012-05-25', u'name': 'HD 89307 b', u'temp_measured': True, u'period': 2199.0, u'detection': u'radial_velocity', u'eccentricity': 0.25, u'radius': None, u'molecules': [], u'max_distance': 3.34, u'mass': 2.0, u'temperature': None} ...

Clone this wiki locally