-
Notifications
You must be signed in to change notification settings - Fork 48
JSON Regular Expressions
Since PFA is JSON, it is easier to inspect, build, or edit if you have good tools for manipulating JSON. JSON is represented in Python as None, True, False, integers, floating-point numbers, strings, Python lists of the above, and Python dictionaries of the above. Titus's "JSON regular expressions" provide a declarative language for manipulating these structures.
"JSON regular expressions" work like regular expressions for text, except that they apply to tree structures, rather than substrings of text. The titus.producer.tools library provides classes for defining patterns, performing JSON-tree searches using those patterns, and using pattern-matches for extractions or modifications.
Keep in mind that, despite the name, these "regular expressions" are not matching the string representation of the JSON, but the Python structure (effectively a DOM).
When you do from titus.producer.tools import *, you get a Python-based DSL for pattern matching. The pfainspector goes one step further and provides a shorter, regex-inspired syntax for the same pattern matching functions. The table below shows all of the patterns and their pfainspector equivalents.
| Python class | Example | pfainspector syntax | Matches |
|---|---|---|---|
| NoneType | None |
null |
null and only null
|
| bool |
True, False
|
true, false
|
only true and false
|
| int | 123 |
123 |
exact number in JSON |
| float | 3.14 |
3.14 |
exact number in JSON |
| str | "hello" |
"hello" |
exact string in JSON |
| list | [1, True, "hello"] |
[1, true, "hello"] |
exact array in JSON, may contain non-trivial patterns |
| dict | {"one": 1, "two": None} |
{one: 1, "two": null} |
exact object in JSON, may contain non-trivial patterns in the values (not keys) |
| Any | Any() |
_ (underscore)
|
anything: any structure or leaf node |
Any(int, str) |
(no equivalent) | specified Python classes | |
Any(int, long, float) |
# (hash sign)
|
any number | |
| LT | LT(3.14) |
# < 3.14 |
number less than specified value |
| LE | LE(3.14) |
# <= 3.14 |
number less than or equal to specified value |
| GT | GT(3.14) |
# > 3.14 |
number greater than specified value |
| GE | GE(3.14) |
# >= 3.14 |
number greater than or equal to specified value |
| Approx | Approx(3.14, 0.01) |
3.14 +- 0.01 |
number with a two-sided range |
| RegEx | RegEx("some.*string") |
/some.*string/ |
strings by (ordinary) regular expression with Python regex syntax |
RegEx("some.*string", flags="i") |
/some.*string/i |
Python regular expression flags | |
RegEx("from", "to") |
/from/to/ |
regular expression with replacement text (only used by functions that change the JSON) | |
| Start | Start(1, 2, 3) |
[1, 2, 3, ...] |
JSON array that begins with specified values, may contain non-trivial patterns |
| End | End(8, 9, 10) |
[..., 8, 9, 10] |
JSON array that ends with specified values, may contain non-trivial patterns |
| Min | Min(key1=value1, key2=value2) |
{key1: value1, "key2": value2, ...} |
JSON object that contains at least a given set of key-value pairs, which may contain non-trivial values (not keys) |
| Group | Group(name=pattern) |
(pattern) |
associates a pattern with a given name in Python or number in pfainspector (matched patterns are labeled by numbers starting with 1) |
| Or | Or(p1, p2, p3) |
`(p1 | p2 |
| And | And(p1, p2, p3) |
(p1 & p2 & p3) |
matches if all sub-patterns match (pfainspector needs group) |
Among these pattern elements, the most useful tend to be the explicit match (None, True, False, numbers, strings, Python lists and dicts), Min, and Any. Many PFA structures are JSON objects with a few known keys, the rest unknown, so Min is disproportionately useful. Any is often used as an easy (if under-specified) placeholder.
The json gadget of the pfainspector uses JSON patterns in the count, index, find, and change subcommands (see pfainspector help text for details).
The titus.producer.tools package defines the following functions that use patterns.
-
search(pattern, haystack)returns a generator that yields(index, Match)pairs. That is, if you dofor i, m in search({"+": [2, 2]}, pfaDocument): print i, myou will print out indexes like
('action', 3, 'log', '+')andMatchobjects. If you dolist(search({"+": [2, 2]}, pfaDocument))you'll get a list of all of them at once.
Matchobjects have anoriginalfield with a copy of the matched structure, amodifiedfield withRegExsubstitutions (if any), and agroupsfield that provides a dict fromGroupname to matched groups within the original match. -
searchFirst(pattern, haystack)returns the first such pair orNoneif there weren't any.
Return to the Hadrian wiki table of contents.
Licensed under the Hadrian Personal Use and Evaluation License (PUEL).