-
Notifications
You must be signed in to change notification settings - Fork 48
PrettyPFA Simple Example
Here is a scoring engine that applies the quadratic formula to input:
>>> pfa = prettypfa.json('''
input: record(a: double, b: double, c: double)
output: union(null,
record(Output,
solution1: double,
solution2: double))
action:
var a = input.a, b = input.b, c = input.c;
var discriminant = b**2 - 4*a*c;
if (discriminant >= 0.0) {
// if there are any real solutions, return them
var x1 = -b + m.sqrt(discriminant)/(2*a);
var x2 = -b - m.sqrt(discriminant)/(2*a);
new(Output, solution1: x1, solution2: x2)
}
else
// otherwise, return null (N/A) null
''')
Note that binary operators such as + and - are expressed with familiar infix notation (between the things they add or subtract) and the if statement looks like something from a C program.
The corresponding PFA is harder to read (especially if it isn't pretty-printed):
>>> print pfa
{"@": "PrettyPFA document", "name": "Engine_1", "input": {"fields": [{"type": "double", "name": "a"}, {"type": "double", "name": "b"}, {"type": "double", "name": "c"}], "type": "record", "name": "Record_63"}, "output": ["null", {"fields": [{"type": "double", "name": "solution1"}, {"type": "double", "name": "solution2"}], "type": "record", "name": "Output"}], "method": "map", "action": [{"@": "PrettyPFA line 8", "let": {"a": {"@": "PrettyPFA line 8", "attr": "input", "path": [{"@": "PrettyPFA line 8", "string": "a"}]}, "c": {"@": "PrettyPFA line 8", "attr": "input", "path": [{"@": "PrettyPFA line 8", "string": "c"}]}, "b": {"@ ": "PrettyPFA line 8", "attr": "input", "path": [{"@": "PrettyPFA line 8", "string": "b"}]}}}, {"@": "PrettyPFA line 10", "let": {"discriminant": {"@": "PrettyPFA line 10", "-": [{"@": "PrettyPFA line 10", "** ": ["b", 2]}, {"@": "PrettyPFA line 10", "*": [{"@": "PrettyPFA line 10", "*": [4, "a"]}, "c"]}]}}}, {"@": "PrettyPFA lines 11-19", "if": {"@": "PrettyPFA line 11", ">=": ["discriminant", 0.0]}, "then": [{"@": "PrettyPFA line 13", "let": {"x1": {"@": "PrettyPFA line 13", "+": [{"@": "PrettyPFA line 13", "u-": ["b"]}, {"@": "PrettyPFA line 13", "/": [{"@": "PrettyPFA line 13", "m.sqrt": ["discriminant"]}, {"@": "PrettyPFA line 13", "*": [2, "a"]}]}]}}}, {"@": "PrettyPFA line 14", "let": {"x2": {"@": "PrettyPFA line 14", "-": [{"@ ": "PrettyPFA line 14", "u-": ["b"]}, {"@": "PrettyPFA line 14", "/": [{"@": "PrettyPFA line 14", "m.sqrt": ["discriminant"]}, {"@": "PrettyPFA line 14", "*": [2, "a"]}]}]}}}, {"@": "PrettyPFA line 15", "type": "Output", "new": {"solution2": "x2", "solution1": "x1"}}], "else": [null]}]}
The "@": "PrettyPFA line ..." key-value pairs pass the source file line numbers to later stages of processing, so that error messages can point back to lines in the original source file.
This PFA document is ready to be used as a scoring engine. You can test it in the usual way:
>>> import titus.genpy
>>> engine, = titus.genpy.PFAEngine.fromJson(pfa)
>>> print engine.action({"a": 1, "b": 8, "c": 4})
{'solution1': -4.535898384862246, 'solution2': -11.464101615137753}
>>> print engine.action({"a": 1, "b": 2, "c": 3})
None
The titus.prettypfa module has functions for generating PFA at any stage of its life cycle.
-
json(ppfa, lineNumbers=True, check=True): construct JSON text from PrettyPFA textppfa. IflineNumbersisFalse, don't include line numbers. IfcheckisFalse, don't verify that the resulting PFA is syntactically and semantically valid. This can be useful for generating invalid PFA that is later made valid by inserting a subtree. -
jsonNode(ppfa, lineNumbers=True, check=True): construct a Python dictionary of Python lists, strings, and numbers from PrettyPFA textppfa. This form can be immediately modified by a Python algorithm. -
ast(ppfa, check=True): construct an abstract syntax tree from PrettyPFA textppfa. This tree is a suite of nested specialized class instances for each PFA form. It can be used for modifications of the PFA that require more knowledge of the tree's structure. -
engine(ppfa, options=None, sharedState=None, multiplicity=1, style="pure", debug=False): construct a list of executable scoring engines from PrettyPFA textppfa. The other options are the same astitus. genpy.PFAEngine.fromJson.
Also, you may have noticed that the input and output type specifications in PrettyPFA are not Avro schema, unlike PFA. The intention is to make them easier to write. However, some datasets already have conventional Avro schema, so Titus has a function to convert Avro schema to PrettyPFA snippets.
-
avscToPretty(avsc, indent=0): construct a PrettyPFA text snippet fromavsc, which is a Python dictionary of Python lists, strings, and numbers, representing an Avro schema. Theindentis the starting indentation level.
If your Avro schema is stored in a file, you can use:
>>> import json
>>> avscToPretty(json.load(open(fileName)))
The simple example above applies a mathematical formula to user-supplied parameters, but PFA was designed for data processing. The example in this section is more realistic.
To begin, download the Exoplanets dataset, which we use for examples. It can be read with Avro or fastavro.
>>> from avro.datafile import DataFileReader
>>> from avro.io import DatumReader
>>> exoplanetsIter = DataFileReader(open("exoplanets.avro"), DatumReader())
>>> exoplanets = list(exoplanetsIter)
>>> print len(exoplanets)
1103
or
>>> import fastavro
>>> exoplanetsIter = fastavro.reader(open("exoplanets.avro"))
>>> exoplanets = list(exoplanetsIter)
>>> print len(exoplanets)
1103
Next, build a scoring engine directly from PrettyPFA. This engine finds the coldest planet orbiting each star in the exoplanets dataset, with contingencies for missing data.
>>> engine, = prettypfa.engine('''
types:
Planet = record(
name: string, // Name of the planet
detection: // Discovery technique
enum([astrometry, imaging, microlensing, pulsar,
radial_velocity, transit, ttv, OTHER]),
discovered: string, // Year of discovery
updated: string, // Date of last update
mass: union(double, null), // Mass over Jupiter's mass
radius: union(double, null), // Radius over Jupiter's
period: union(double, null), // Planet year (Earth days)
max_distance: union(double, null), // Distance from star (AU)
eccentricity: union(double, null), // (0 = circle, 1 = escapes)
temperature: union(double, null), // Temperature (Kelvin)
temp_measured: union(boolean, null), // True if the measured
molecules: array(string) // Molecules observed
);
Star = record(
name: string, // Name of the star
ra: union(double, null), // Right ascension (degrees)
dec: union(double, null), // Declination (degrees)
mag: union(double, null), // Magnitude (unitless)
dist: union(double, null), // Distance away (parsecs)
mass: union(double, null), // Mass over Sun's mass
radius: union(double, null), // Radius over Sun's radius
age: union(double, null), // Age (billions of years)
temp: union(double, null), // Temperature (Kelvin)
type: union(string, null), // Spectral type
planets: array(Planet) // Orbiting planets
);
PlanetWithTemp = record(planet: Planet, temp: double)
input: Star
output: Planet
method: emit
action:
var star = input; // name the input for convenience
// build up a list of planets with temperature estimates
var pt = json(array(PlanetWithTemp), []);
foreach (planet: star.planets, seq: true) {
var temp =
ifnotnull(t: planet.temperature)
// if a planet's temperature is already defined, use it
t
else {
// otherwise, estimate it from the star
ifnotnull(t: star.temp,
r: star.radius,
d: planet.max_distance) {
var r_in_km = r * 695800.0;
var d_in_km = d * 149600000.0;
t / (d/r)**2
}
else
// third case: not enough data to make any estimate
null
};
// if the above resulted in an estimate, add it to the list
ifnotnull(t: temp) {
pt = a.append(pt, new(PlanetWithTemp,
planet: planet,
temp: t))
}
};
// if the list is not empty...
if (a.len(pt) > 0) {
// find the coldest planet
var coldest =
a.minLT(pt, fcn(x: PlanetWithTemp,
y: PlanetWithTemp -> boolean) {
x.temp < y.temp
});
// and emit it as the result of this scoring engine
emit(coldest.planet)
}
''')
Now run the scoring engine on the exoplanets dataset.
>>> def emit(x):
... print x
...
>>> engine.emit = emit
>>>
>>> for star in exoplanets:
... engine.action(star)
...
{u'discovered': '2014', u'updated': '2014-03-06', u'name': 'Kepler-207 d', u'temp_measured': True, u'period': 5.868075, u'detection': u'transit', u'eccentricity': None, u'radius': 0.295, u'molecules': [], u'max_distance': 0.068, u'mass': None, u'temperature': None}
{u'discovered': '2004', u'updated': '2012-05-25', u'name': 'HD 89307 b', u'temp_measured': True, u'period': 2199.0, u'detection': u'radial_velocity', u'eccentricity': 0.25, u'radius': None, u'molecules': [], u'max_distance': 3.34, u'mass': 2.0, u'temperature': None} ...
Return to the Hadrian wiki table of contents.
Licensed under the Hadrian Personal Use and Evaluation License (PUEL).