Parser objects are returned by any of the built-in parser :doc:`primitives`. They can be used and manipulated as below.
.. currentmodule:: parsy
.. method:: __init__(wrapped_fn) This is a low level function to create new parsers that is used internally but is rarely needed by users of the parsy library. It should be passed a parsing function, which takes two arguments - a string/list to be parsed and the current index into the list - and returns a :class:`Result` object, as described in :doc:`/ref/parser_instances`.
The following methods are for actually using the parsers that you have created:
.. method:: parse(stream) Attempts to parse the given :class:`Stream` of data. If the parse is successful and consumes the entire stream, the result is returned - otherwise, a ``ParseError`` is raised. Most commonly, a stream simply wraps a string, but you could use a list of tokens instead. Almost all the examples assume strings for simplicity. Some of the primitives are also clearly string specific, and a few of the combinators (such as :meth:`Parser.concat`) are string specific, but most of the rest of the library will work with tokens just as well. See :doc:`/howto/lexing` for more information.
.. method:: parse_partial(stream) Similar to ``parse``, except that it does not require the entire stream to be consumed. Returns a tuple of ``(result, remainder)``, where ``remainder`` is the part of the stream that was left over.
The following methods are essentially combinators that produce new
parsers from the existing one. They are provided as methods on Parser for
convenience. More combinators are documented below.
.. method:: desc(string)
Adds a description to the parser, which is used in the error message
if parsing fails.
>>> year = regex(r'[0-9]{4}').desc('4 digit year')
>>> year.parse('123')
ParseError: expected 4 digit year at 0:0
.. method:: then(other_parser)
Returns a parser which, if the initial parser succeeds, will continue parsing
with ``other_parser``. This will produce the value produced by
``other_parser``.
.. code:: python
>>> string('x').then(string('y')).parse('xy')
'y'
See also :ref:`parser-rshift`.
.. method:: skip(other_parser)
Similar to :meth:`Parser.then`, except the resulting parser will use
the value produced by the first parser.
.. code:: python
>>> string('x').skip(string('y')).parse('xy')
'x'
See also :ref:`parser-lshift`.
.. method:: many()
Returns a parser that expects the initial parser 0 or more times, and
produces a list of the results. Note that this parser does not fail if
nothing matches, but instead consumes nothing and produces an empty list.
.. code:: python
>>> parser = regex(r'[a-z]').many()
>>> parser.parse('')
[]
>>> parser.parse('abc')
['a', 'b', 'c']
.. method:: times(min [, max=min]) Returns a parser that expects the initial parser at least ``min`` times, and at most ``max`` times, and produces a list of the results. If only one argument is given, the parser is expected exactly that number of times.
.. method:: at_most(n) Returns a parser that expects the initial parser at most ``n`` times, and produces a list of the results.
.. method:: at_least(n) Returns a parser that expects the initial parser at least ``n`` times, and produces a list of the results.
.. method:: until(other_parser, [min=0, max=inf, consume_other=False])
Returns a parser that expects the initial parser followed by ``other_parser``.
The initial parser is expected at least ``min`` times and at most ``max`` times.
By default, it does not consume ``other_parser`` and it produces a list of the
results excluding ``other_parser``. If ``consume_other`` is ``True`` then
``other_parser`` is consumed and its result is included in the list of results.
.. code:: python
>>> seq(string('A').until(string('B')), string('BC')).parse('AAABC')
[['A','A','A'], 'BC']
>>> string('A').until(string('B')).then(string('BC')).parse('AAABC')
'BC'
>>> string('A').until(string('BC'), consume_other=True).parse('AAABC')
['A', 'A', 'A', 'BC']
.. versionadded:: 2.0
.. method:: optional(default=None)
Returns a parser that expects the initial parser zero or once, and maps
the result to a given default value in the case of no match. If no default
value is given, ``None`` is used.
.. code:: python
>>> string('A').optional().parse('A')
'A'
>>> string('A').optional().parse('')
None
>>> string('A').optional('Oops').parse('')
'Oops'
.. method:: map(map_function)
Returns a parser that transforms the produced value of the initial parser
with ``map_function``.
.. code:: python
>>> regex(r'[0-9]+').map(int).parse('1234')
1234
This is the simplest way to convert parsed strings into the data types
that you need. See also :meth:`combine` and :meth:`combine_dict` below.
.. method:: combine(combine_fn)
Returns a parser that transforms the produced values of the initial parser
with ``combine_fn``, passing the arguments using ``*args`` syntax.
Where the current parser produces an iterable of values, this can be a
more convenient way to combine them than :meth:`~Parser.map`.
Example 1 - the argument order of our callable already matches:
.. code:: python
>>> from datetime import date
>>> yyyymmdd = seq(regex(r'[0-9]{4}').map(int),
... regex(r'[0-9]{2}').map(int),
... regex(r'[0-9]{2}').map(int)).combine(date)
>>> yyyymmdd.parse('20140506')
datetime.date(2014, 5, 6)
Example 2 - the argument order of our callable doesn't match, and
we need to adjust a parameter, so we can fix it using a lambda.
.. code:: python
>>> ddmmyy = regex(r'[0-9]{2}').map(int).times(3).combine(
... lambda d, m, y: date(2000 + y, m, d))
>>> ddmmyy.parse('060514')
datetime.date(2014, 5, 6)
The equivalent ``lambda`` to use with ``map`` would be ``lambda res:
date(2000 + res[2], res[1], res[0])``, which is less readable. The version
with ``combine`` also ensures that exactly 3 items are generated by the
previous parser, otherwise you get a ``TypeError``.
.. method:: combine_dict(fn)
Returns a parser that transforms the value produced by the initial parser
using the supplied function/callable, passing the arguments using the
``**kwargs`` syntax.
The value produced by the initial parser must be a mapping/dictionary from
names to values, or a list of two-tuples, or something else that can be
passed to the ``dict`` constructor.
If ``None`` is present as a key in the dictionary it will be removed
before passing to ``fn``, as will all keys starting with ``_``.
**Motivation:**
For building complex objects, this can be more convenient, flexible and
readable than :meth:`map` or :meth:`combine`, because by avoiding
positional arguments we can avoid a dependence on the order of components
in the string being parsed and in the argument order of callables being
used. It is especially designed to be used in conjunction with :func:`seq`
and :meth:`tag`.
We can make use of the ``**kwargs`` version of :func:`seq` to produce a
very readable definition:
.. code:: python
>>> ddmmyyyy = seq(
... day=regex(r'[0-9]{2}').map(int),
... month=regex(r'[0-9]{2}').map(int),
... year=regex(r'[0-9]{4}').map(int),
... ).combine_dict(date)
>>> ddmmyyyy.parse('04052003')
datetime.date(2003, 5, 4)
(If that is hard to understand, use a Python REPL, and examine the result
of the ``parse`` call if you remove the ``combine_dict`` call).
Here we used ``datetime.date`` which accepts keyword arguments. For your
own parsing needs you will often use custom data types. You can create
these however you like, but we suggest `dataclasses
<https://docs.python.org/3/library/dataclasses.html>`_ (stdlib), `attrs
<https://github.com/python-attrs/attrs>`_ or `pydantic
<https://github.com/samuelcolvin/pydantic/>`_. You can also use
`namedtuple
<https://docs.python.org/3/library/collections.html#collections.namedtuple>`_
for simple cases.
The following example shows the use of ``_`` as a prefix to remove
elements you are not interested in, and the use of ``namedtuple`` to
create a simple data-structure.
.. code-block:: python
>>> from collections import namedtuple
>>> Pair = namedtuple('Pair', ['name', 'value'])
>>> name = regex("[A-Za-z]+")
>>> int_value = regex("[0-9]+").map(int)
>>> bool_value = string("true").result(True) | string("false").result(False)
>>> pair = seq(
... name=name,
... __eq=string('='),
... value=int_value | bool_value,
... __sc=string(';'),
... ).combine_dict(Pair)
>>> pair.parse("foo=123;")
Pair(name='foo', value=123)
>>> pair.parse("BAR=true;")
Pair(name='BAR', value=True)
You could also use ``<<`` or ``>>`` for the unwanted parts (but in some
cases this is less convenient):
.. code-block:: python
>>> pair = seq(
... name=name << string('='),
... value=(int_value | bool_value) << string(';')
... ).combine_dict(Pair)
.. versionchanged:: 1.2
Allow lists as well as dicts to be consumed, and filter out ``None``.
.. versionchanged:: 1.3
Stripping of args starting with ``_``
.. method:: tag(name)
Returns a parser that wraps the produced value of the initial parser in a
2 tuple containing ``(name, value)``. This provides a very simple way to
label parsed components. e.g.:
.. code:: python
>>> day = regex(r'[0-9]+').map(int)
>>> month = string_from("January", "February", "March", "April", "May",
... "June", "July", "August", "September", "October",
... "November", "December")
>>> day.parse("10")
10
>>> day.tag("day").parse("10")
('day', 10)
>>> seq(day.tag("day") << whitespace,
... month.tag("month")
... ).parse("10 September")
[('day', 10), ('month', 'September')]
It also works well when combined with ``.map(dict)`` to get a dictionary
of values:
.. code:: python
>>> seq(day.tag("name") << whitespace,
... month.tag("month")
... ).map(dict).parse("10 September")
{'day': 10, 'month': 'September'}
... and with :meth:`combine_dict` to build other objects.
Usually it is better to use :func:`seq` with keyword arguments if you want
to produce a dictionary.
.. method:: concat()
Returns a parser that concatenates together (as a string) the previously
produced values. Usually used after :meth:`~Parser.many` and similar
methods that produce multiple values.
.. code:: python
>>> letter.at_least(1).parse("hello")
['h', 'e', 'l', 'l', 'o']
>>> letter.at_least(1).concat().parse("hello")
'hello'
.. method:: result(val)
Returns a parser that, if the initial parser succeeds, always produces
``val``.
.. code:: python
>>> string('foo').result(42).parse('foo')
42
.. method:: should_fail(description)
Returns a parser that fails when the initial parser succeeds, and succeeds
when the initial parser fails (consuming no input). A description must
be passed which is used in parse failure messages.
This is essentially a negative lookahead:
.. code:: python
>>> p = letter << string(" ").should_fail("not space")
>>> p.parse('A')
'A'
>>> p.parse('A ')
ParseError: expected 'not space' at 0:1
It is also useful for implementing things like parsing repeatedly until a
marker:
.. code:: python
>>> (string(";").should_fail("not ;") >> letter).many().concat().parse_partial('ABC;')
('ABC', ';')
.. method:: bind(fn) Returns a parser which, if the initial parser is successful, passes the result to ``fn``, and continues with the parser returned from ``fn``. This is the monadic binding operation. However, since we don't have Haskell's ``do`` notation in Python, using this is very awkward. Instead, you should look at :doc:`/ref/generating/` which provides a much nicer syntax for that cases where you would have needed ``do`` notation in Parsec.
.. method:: sep_by(sep, min=0, max=inf)
Like :meth:`Parser.times`, this returns a new parser that repeats
the initial parser and collects the results in a list, but in this case separated
by the parser ``sep`` (whose return value is discarded). By default it
repeats with no limit, but minimum and maximum values can be supplied.
.. code:: python
>>> csv = letter.at_least(1).concat().sep_by(string(","))
>>> csv.parse("abc,def")
['abc', 'def']
.. method:: mark()
Returns a parser that wraps the initial parser's result in a value
containing column and line information of the match, as well as the
original value. The new value is a 3-tuple:
.. code:: python
((start_row, start_column),
original_value,
(end_row, end_column))
This is useful for being able to report problems with parsing more
accurately, especially if you are using parsy as a :doc:`lexer
</howto/lexing/>` and want subsequent parsing of the token stream to be
able to report original positions in error messages etc.
.. method:: span()
Returns a parser that augments the initial parser's result with a :class:`SourceSpan`
containing information about where that parser started and stopped within the
source data. The new value is a tuple:
.. code:: python
(source_span, original_value)
This enables reporting of custom errors involving source locations, such as when
using parsy as a :doc:`lexer</howto/lexing/>` or when building a syntax tree that will be
further analyzed.
This section describes operators that you can use on :class:`Parser` objects to build new parsers.
parser | other_parser
Returns a parser that tries parser and, if it fails, backtracks
and tries other_parser. These can be chained together.
The resulting parser will produce the value produced by the first successful parser.
>>> parser = string('x') | string('y') | string('z')
>>> parser.parse('x')
'x'
>>> parser.parse('y')
'y'
>>> parser.parse('z')
'z'Note that other_parser will only be tried if parser cannot consume any
input and fails. other_parser is not used in the case that later parser
components fail. This means that the order of the operands matters - for
example:
>>> ((string('A') | string('AB')) + string('C')).parse('ABC')
ParseEror: expected 'C' at 0:1
>>> ((string('AB') | string('A')) + string('C')).parse('ABC')
'ABC'
>>> ((string('AB') | string('A')) + string('C')).parse('AC')
'AC'parser << other_parser
The same as parser.skip(other_parser) - see :meth:`Parser.skip`.
(Hint - the arrows point at the important parser!)
>>> (string('x') << string('y')).parse('xy')
'x'parser >> other_parser
The same as parser.then(other_parser) - see :meth:`Parser.then`.
(Hint - the arrows point at the important parser!)
>>> (string('x') >> string('y')).parse('xy')
'y'parser1 + parser2
Requires both parsers to match in order, and adds the two results together using the + operator. This will only work if the results support the plus operator (e.g. strings and lists):
>>> (string("x") + regex("[0-9]")).parse("x1")
"x1"
>>> (string("x").many() + regex("[0-9]").map(int).many()).parse("xx123")
['x', 'x', 1, 2, 3]The plus operator is a convenient shortcut for:
>>> seq(parser1, parser2).combine(lambda a, b: a + b)
parser1 * number
This is a shortcut for doing :meth:`Parser.times`:
>>> (string("x") * 3).parse("xxx")
["x", "x", "x"]You can also set both upper and lower bounds by multiplying by a range:
>>> (string("x") * range(0, 3)).parse("xxx")
ParseError: expected EOF at 0:2(Note the normal semantics of range are respected - the second number is an
exclusive upper bound, not inclusive).
.. function:: alt(*parsers)
Creates a parser from the passed in argument list of alternative parsers,
which are tried in order, moving to the next one if the current one fails, as
per the :ref:`parser-or` - in other words, it matches any one of the
alternative parsers.
Example using ``*args`` syntax to pass a list of parsers that have been
generated by mapping :func:`string` over a list of characters:
.. code-block:: python
>>> hexdigit = alt(*map(string, "0123456789abcdef"))
(In this case you would be better off using :func:`char_from`)
Note that the order of arguments matter, as described in :ref:`parser-or`.
.. function:: seq(*parsers, **kw_parsers)
Creates a parser that runs a sequence of parsers in order and combines
their results in a list.
.. code-block:: python
>>> x_bottles_of_y_on_the_z = \
... seq(regex(r"[0-9]+").map(int) << string(" bottles of "),
... regex(r"\S+") << string(" on the "),
... regex(r"\S+")
... )
>>> x_bottles_of_y_on_the_z.parse("99 bottles of beer on the wall")
[99, 'beer', 'wall']
You can also use :func:`seq` with keyword arguments instead of positional
arguments. In this case, the produced value is a dictionary of the individual
values, rather than a sequence. This can make the produced value easier to
consume.
.. code-block:: python
>>> name = seq(first_name=regex("\S+") << whitespace,
... last_name=regex("\S+")
>>> name.parse("Jane Smith")
{'first_name': 'Jane',
'last_name': 'Smith'}
.. versionchanged:: 1.1
Added ``**kwargs`` option.
.. note::
As an alternative, see :meth:`Parser.tag` for a way of labelling parsed
components and producing dictionaries.
Parsy does not try to include every possible combinator - there is no reason why you cannot create your own for your needs using the built-in combinators and primitives. If you find something that is very generic and would be very useful to have as a built-in, please :doc:`submit </contributing>` as a PR!
Wraps a string, byte sequence, or list, possibly equipping it with a source. If the data is loaded from a file or URL, the source should be that file path or URL. The source name is used in generated parse error messages.
.. method:: __init__(data, [source=None]) Wraps the data into a stream, possibly equipping it with a source.
Identifies a span of material from the data being parsed by its start row and column and its end row and column. If the data stream was equipped with a source, that value is also available in this object.