Skip to content

Commit e6cd01e

Browse files
Update parser documentation and expose some APIs
1 parent 68bcec5 commit e6cd01e

File tree

11 files changed

+175
-113
lines changed

11 files changed

+175
-113
lines changed

core/src/Streamly/Data/Array.hs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,11 @@ module Streamly.Data.Array
7676

7777
-- * Stream of Arrays
7878
, chunksOf
79+
, parserK
80+
, parse
81+
, parseBreak
82+
, parsePos
83+
, parseBreakPos
7984

8085
-- * Casting
8186
, cast

core/src/Streamly/Data/Array/Generic.hs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,11 @@ module Streamly.Data.Array.Generic
3535

3636
-- * Stream of Arrays
3737
, chunksOf
38+
, parserK
39+
, parse
40+
, parseBreak
41+
, parsePos
42+
, parseBreakPos
3843

3944
-- * Random Access
4045
, length

core/src/Streamly/Data/Parser.hs

Lines changed: 67 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -7,20 +7,23 @@
77
-- Stability : pre-release
88
-- Portability : GHC
99
--
10-
-- Parsers are more powerful 'Streamly.Data.Fold.Fold's:
10+
-- Parsers are more powerful but less general than 'Streamly.Data.Fold.Fold's:
1111
--
1212
-- * folds cannot fail but parsers can fail and backtrack.
1313
-- * folds can be composed as a Tee but parsers cannot.
1414
-- * folds can be used for scanning but parsers cannot.
1515
-- * folds can be converted to parsers.
1616
--
1717
-- Streamly parsers support all operations offered by popular Haskell parser
18-
-- libraries. They operate on a generic input type, support streaming, and are
19-
-- faster.
18+
-- libraries. Unlike other parser libraries, (1) streamly parsers can operate
19+
-- on any Haskell type as input - not just bytes, (2) natively support
20+
-- streaming, (3) and are faster.
2021
--
21-
-- Like folds, parsers use stream fusion, compiling to efficient low-level code
22-
-- comparable to the speed of C. Parsers are suitable for high-performance
23-
-- parsing of streams.
22+
-- == High Performance by Static Parser Fusion
23+
--
24+
-- Like folds, parsers are designed to utilize stream fusion, compiling to
25+
-- efficient low-level code comparable to the speed of C. Parsers are suitable
26+
-- for high-performance parsing of streams.
2427
--
2528
-- Operations in this module are designed to be composed statically rather than
2629
-- dynamically. They are inlined to enable static fusion. More importantly,
@@ -30,31 +33,36 @@
3033
-- from the "Streamly.Data.ParserK" module. 'Parser' and
3134
-- 'Streamly.Data.ParserK.ParserK' types are interconvertible.
3235
--
33-
-- == Using Parsers
36+
-- == How to parse a stream?
37+
--
38+
-- Parser combinators can be used to create a pipeline of folds or parsers such
39+
-- that the next fold or parser consumes the result of the previous parser.
40+
-- Such a composed pipeline of parsers can then be driven by one of many parser
41+
-- drivers available in the Stream and Array modules.
3442
--
35-
-- This module provides elementary parsers and parser combinators that can be
36-
-- used to parse a stream of data. Additionally, all the folds from the
37-
-- "Streamly.Data.Fold" module can be converted to parsers using 'fromFold'.
38-
-- All the parsing functionality provided by popular parsing libraries, and
39-
-- more is available. Also see "Streamly.Unicode.Parser" module for Char stream
40-
-- parsers.
43+
-- Use Streamly.Data.Stream.'Streamly.Data.Stream.parse' or
44+
-- Streamly.Data.Stream.'Streamly.Data.Stream.parseBreak' to run a parser on an
45+
-- input stream and return the parsed result.
4146
--
42-
-- A data stream can be transformed to a stream of parsed data elements. Parser
43-
-- combinators can be used to create a pipeline of folds or parsers such that
44-
-- the next fold or parser consumes the result of the previous parser. See
45-
-- 'Streamly.Data.Stream.parse' and 'Streamly.Data.Stream.parseMany' to run
46-
-- these parsers on a stream.
47+
-- Use Streamly.Data.Stream.'Streamly.Data.Stream.parseMany' or
48+
-- Streamly.Data.Stream.'Streamly.Data.Stream.parseIterate' to transform an
49+
-- input data stream to an output stream of parsed data elements using a
50+
-- parser.
4751
--
4852
-- == Parser vs ParserK
4953
--
5054
-- There are two functionally equivalent parsing modules,
5155
-- "Streamly.Data.Parser" (this module) and "Streamly.Data.ParserK". The latter
5256
-- is a CPS based wrapper over the former, and can be used for parsing in
53-
-- general. "Streamly.Data.Parser" enables stream fusion and should be
57+
-- general. "Streamly.Data.Parser" enables stream fusion and where possible it should be
5458
-- preferred over "Streamly.Data.ParserK" for high performance stream parsing
5559
-- use cases. However, there are a few cases where this module is not
5660
-- suitable and ParserK should be used instead.
5761
--
62+
-- As a thumb rule, when recursion or heavy nesting is needed use ParserK.
63+
--
64+
-- === Parser: non-recursive static fusion
65+
--
5866
-- For static fusion, parser combinators have to use strict pattern matching on
5967
-- arguments of type Parser. This leads to infinte loop when a parser is
6068
-- defined recursively, due to strict evaluation of the recursive call. For
@@ -74,6 +82,8 @@
7482
-- >>> p = p1 <|> p2
7583
-- >>> :}
7684
--
85+
-- === ParserK: recursive application
86+
--
7787
-- Use ParserK when recursive use is required:
7888
--
7989
-- >>> import Streamly.Data.ParserK (ParserK, parserK)
@@ -105,25 +115,54 @@
105115
-- combined @n@ times, roughly 8 or less sequenced parsers are fine. READ THE
106116
-- DOCS OF APPLICATIVE, MONAD AND ALTERNATIVE INSTANCES.
107117
--
108-
-- == Streaming Parsers
118+
-- == Parsers Galore!
119+
--
120+
-- Streamly provides all the parsing functionality provided by popular parsing
121+
-- libraries, and much more with higher performance.
122+
-- This module provides most of the elementary parsers and parser combinators.
123+
-- Additionally,
109124
--
110-
-- With 'Streamly.Data.ParserK.ParserK' you can use the generic Alternative
111-
-- type class based parsers from the
125+
-- * all the folds from the "Streamly.Data.Fold" module can be converted to
126+
-- parsers using 'fromFold'.
127+
-- * "Streamly.Unicode.Parser" module provides Char stream parsers.
128+
-- * all the combinators from the
112129
-- <https://hackage.haskell.org/package/parser-combinators parser-combinators>
113-
-- library or similar. However, we recommend that you use the equivalent
114-
-- functionality from this module for better performance and for streaming
115-
-- behavior.
130+
-- package can be used with streamly ParserK.
131+
-- * See "Streamly.Internal.Data.Parser" for many more unreleased but useful APIs.
132+
--
133+
-- == Generic Parser Combinators
134+
--
135+
-- With 'Streamly.Data.ParserK.ParserK' you can use the 'Applicative' and
136+
-- 'Control.Applicative.Alternative' type class based generic parser
137+
-- combinators from the
138+
-- <https://hackage.haskell.org/package/parser-combinators parser-combinators>
139+
-- library or similar. However, if available, we recommend that you use the
140+
-- equivalent functionality from this module where performance and streaming
141+
-- behavior matters.
116142
--
117143
-- Firstly, the combinators in this module are faster due to stream fusion.
118144
-- Secondly, these are streaming in nature as the results can be passed
119145
-- directly to other stream consumers (folds or parsers). The Alternative type
120146
-- class based parsers would end up buffering all the results in lists before
121147
-- they can be consumed.
122148
--
123-
-- When recursion or heavy nesting is needed use ParserK.
124-
--
125149
-- == Error Reporting
126150
--
151+
-- There are two types of parser drivers available, @parse@ and @parseBreak@
152+
-- drivers do not track stream position, whereas @parsePos@ and @parseBreakPos@
153+
-- drivers track stream position information with slightly more performance
154+
-- overhead.
155+
--
156+
-- When an error occurs the stream position is reported, in case byte streams
157+
-- or unboxed array streams this is the byte position, in case of generic
158+
-- element parsers or generic array parsers this is the element position in the
159+
-- stream.
160+
--
161+
-- If you need line number or column information you can read the stream again
162+
-- (if it is immutable) and translate the reported byte position to line number
163+
-- and column. More elaborate support for computing arbitrary and custom error
164+
-- context information is planned to be added in future.
165+
--
127166
-- These parsers do not report the error context (e.g. line number or column).
128167
-- This may be supported in future.
129168
--
@@ -148,7 +187,7 @@ module Streamly.Data.Parser
148187

149188
-- * Parser Type
150189
Parser
151-
, ParseError
190+
, ParseError(..)
152191

153192
-- -- * Downgrade to Fold
154193
-- , toFold

core/src/Streamly/Data/ParserK.hs

Lines changed: 58 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -8,78 +8,88 @@
88
-- Portability : GHC
99
--
1010
-- See the general notes about parsing in the "Streamly.Data.Parser" module.
11-
-- This module implements a using Continuation Passing Style (CPS) wrapper over
12-
-- the "Streamly.Data.Parser" module. It is as fast or faster than attoparsec.
13-
--
14-
-- Streamly parsers support all operations offered by popular Haskell parser
15-
-- libraries. They operate on a generic input type, support streaming, and are
16-
-- faster.
11+
-- This (ParserK) module implements a Continuation Passing Style (CPS) wrapper
12+
-- over the fused "Streamly.Data.Parser" module. It is a faster CPS parser than
13+
-- attoparsec.
1714
--
1815
-- The 'ParserK' type represents a stream-consumer as a composition of function
1916
-- calls, therefore, a function call overhead is incurred at each composition.
20-
-- It is reasonably fast in general but may be a few times slower than a fused
21-
-- parser represented by the 'Streamly.Data.Parser.Parser' type. However, it
17+
-- It is reasonably fast in general but may be a few times slower than the
18+
-- fused 'Streamly.Data.Parser.Parser' type. However, unlike fused parsers, it
2219
-- allows for scalable dynamic composition, especially, 'ParserK' can be used
2320
-- in recursive calls. Operations like 'splitWith' on 'ParserK' type have
2421
-- linear (O(n)) performance with respect to the number of compositions.
2522
--
26-
-- 'ParserK' is preferred over 'Streamly.Data.Parser.Parser' when extensive
27-
-- applicative, alternative and monadic composition is required, or when
28-
-- recursive or dynamic composition of parsers is required. 'ParserK' also
29-
-- allows efficient parsing of a stream of arrays, it can also break the input
30-
-- stream into a parse result and remaining stream so that the stream can be
31-
-- parsed independently in segments.
32-
--
33-
-- == Using ParserK
34-
--
35-
-- All the parsers from the "Streamly.Data.Parser" module can be converted to
36-
-- ParserK using the 'Streamly.Data.Array.parserK',
37-
-- 'Streamly.Internal.Data.ParserK.parserK', and
38-
-- 'Streamly.Internal.Data.Array.Generic.parserK' combinators.
39-
--
40-
-- 'Streamly.Data.Array.parse' runs a parser on a stream of unboxed
41-
-- arrays, this is the preferred and most efficient way to parse chunked input.
42-
-- The more general 'Streamly.Data.Array.parseBreak' function returns
43-
-- the remaining stream as well along with the parse result. There are
44-
-- 'Streamly.Internal.Data.Array.Generic.parse',
45-
-- 'Streamly.Internal.Data.Array.Generic.parseBreak' as well to run
46-
-- parsers on boxed arrays. 'Streamly.Internal.Data.StreamK.parse',
47-
-- 'Streamly.Internal.Data.StreamK.parseBreak' run parsers on a stream of
48-
-- individual elements instead of stream of arrays.
23+
-- 'ParserK' is preferred over the fused 'Streamly.Data.Parser.Parser' when
24+
-- extensive applicative, alternative and monadic composition is required, or
25+
-- when recursive or dynamic composition of parsers is required. 'ParserK' also
26+
-- allows efficient parsing of a stream of byte arrays, it can also break the
27+
-- input stream into a parse result and the remaining stream so that the stream
28+
-- can be parsed independently in segments.
4929
--
50-
-- == Monadic Composition
30+
-- == How to parse a stream?
5131
--
52-
-- Monad composition can be used for lookbehind parsers, we can dynamically
53-
-- compose new parsers based on the results of the previously parsed values.
32+
-- All the fused parsers from the "Streamly.Data.Parser" module can be
33+
-- converted to the CPS ParserK, for use with different types of parser
34+
-- drivers, using
35+
-- the @parserK@ combinators - Streamly.Data.Array.'Streamly.Data.Array.parserK',
36+
-- Streamly.Data.StreamK.'Streamly.Data.StreamK.parserK', and
37+
-- Streamly.Data.Array.Generic.'Streamly.Data.Array.Generic.parserK'.
38+
--
39+
-- To parse a stream of unboxed arrays, use
40+
-- Streamly.Data.Array.'Streamly.Data.Array.parse' for running the parser, this
41+
-- is the preferred and most efficient way to parse chunked input. The
42+
-- Streamly.Data.Array.'Streamly.Data.Array.parseBreak' function returns the
43+
-- remaining stream as well along with the parse result.
44+
--
45+
-- To parse a stream of boxed arrays, use
46+
-- Streamly.Data.Array.Generic.'Streamly.Data.Array.Generic.parse' or
47+
-- Streamly.Data.Array.Generic.'Streamly.Data.Array.Generic.parseBreak' to run
48+
-- the parser.
49+
--
50+
-- To parse a stream of individual elements, use
51+
-- Streamly.Data.StreamK.'Streamly.Data.StreamK.parse' and
52+
-- Streamly.Data.StreamK.'Streamly.Data.StreamK.parseBreak' to run the parser.
5453
--
55-
-- If we have to parse "a9" or "9a" but not "99" or "aa" we can use the
56-
-- following non-monadic, backtracking parser:
54+
-- == Applicative Composition
5755
--
58-
-- >>> digits p1 p2 = ((:) <$> p1 <*> ((:) <$> p2 <*> pure []))
56+
-- Applicative parsers are simpler but we cannot use lookbehind as we can in
57+
-- the monadic parsers.
58+
--
59+
-- If we have to parse "9a" or "a9" but not "99" or "aa" we can use the
60+
-- following non-monadic (Applicative), backtracking parser:
61+
--
62+
-- >>> -- parse p1 : p2 : []
63+
-- >>> token p1 p2 = ((:) <$> p1 <*> ((:) <$> p2 <*> pure []))
5964
-- >>> :{
6065
-- backtracking :: Monad m => ParserK Char m String
61-
-- backtracking = ParserK.parserK $
62-
-- digits (Parser.satisfy isDigit) (Parser.satisfy isAlpha)
66+
-- backtracking = StreamK.parserK $
67+
-- token (Parser.satisfy isDigit) (Parser.satisfy isAlpha) -- e.g. "9a"
6368
-- <|>
64-
-- digits (Parser.satisfy isAlpha) (Parser.satisfy isDigit)
69+
-- token (Parser.satisfy isAlpha) (Parser.satisfy isDigit) -- e.g. "a9"
6570
-- :}
6671
--
67-
-- We know that if the first parse resulted in a digit at the first place then
68-
-- the second parse is going to fail. However, we waste that information and
69-
-- parse the first character again in the second parse only to know that it is
70-
-- not an alphabetic char. By using lookbehind in a 'Monad' composition we can
71-
-- avoid redundant work:
72+
-- == Monadic Composition
73+
--
74+
-- Monad composition can be used to implement lookbehind parsers, we can dynamically
75+
-- compose new parsers based on the results of the previously parsed values.
76+
--
77+
-- In the previous example, we know that if the first parse resulted in a digit
78+
-- at the first place then the second parse is going to fail. However, we
79+
-- waste that information and parse the first character again in the second
80+
-- parse only to know that it is not an alphabetic char. By using lookbehind
81+
-- in a 'Monad' composition we can avoid redundant work:
7282
--
7383
-- >>> data DigitOrAlpha = Digit Char | Alpha Char
7484
--
7585
-- >>> :{
7686
-- lookbehind :: Monad m => ParserK Char m String
7787
-- lookbehind = do
78-
-- x1 <- ParserK.parserK $
88+
-- x1 <- StreamK.parserK $
7989
-- Digit <$> Parser.satisfy isDigit
8090
-- <|> Alpha <$> Parser.satisfy isAlpha
8191
-- -- Note: the parse depends on what we parsed already
82-
-- x2 <- ParserK.parserK $
92+
-- x2 <- StreamK.parserK $
8393
-- case x1 of
8494
-- Digit _ -> Parser.satisfy isAlpha
8595
-- Alpha _ -> Parser.satisfy isDigit
@@ -105,11 +115,8 @@ module Streamly.Data.ParserK
105115
ParserK
106116

107117
-- * Parsers
108-
-- ** Conversions
109-
, parserK
110-
-- , toParser
111118

112-
-- ** Without Input
119+
-- -- ** Without Input
113120
, fromPure
114121
, fromEffect
115122
, die

core/src/Streamly/Data/Stream.hs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -241,7 +241,9 @@ module Streamly.Data.Stream
241241

242242
-- ** Parsing
243243
, parse
244-
-- , parseBreak
244+
, parseBreak
245+
, parsePos
246+
, parseBreakPos
245247

246248
-- ** Lazy Right Folds
247249
-- | Consuming a stream to build a right associated expression, suitable

core/src/Streamly/Data/StreamK.hs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,8 +125,11 @@ module Streamly.Data.StreamK
125125
-- , foldBreak
126126

127127
-- ** Parsing
128+
, parserK
128129
, parse
129130
, parseBreak
131+
, parsePos
132+
, parseBreakPos
130133

131134
-- * Transformation
132135
, mapM

core/src/Streamly/Internal/Data/Parser.hs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2595,6 +2595,8 @@ data DeintercalateAllState fs sp ss =
25952595
-- >>> Stream.parse p $ Stream.fromList "1+2+3"
25962596
-- Right [Left "1",Right '+',Left "2",Right '+',Left "3"]
25972597
--
2598+
-- See also 'Streamly.Internal.Data.ParserK.chainl1'.
2599+
--
25982600
{-# INLINE deintercalateAll #-}
25992601
deintercalateAll :: Monad m =>
26002602
Parser a m x
@@ -2718,6 +2720,8 @@ data DeintercalateState b fs sp ss =
27182720
-- >>> Stream.parse p $ Stream.fromList "1+2+3"
27192721
-- Right [Left "1",Right '+',Left "2",Right '+',Left "3"]
27202722
--
2723+
-- See also 'Streamly.Internal.Data.ParserK.chainl1'.
2724+
--
27212725
{-# INLINE deintercalate #-}
27222726
deintercalate :: Monad m =>
27232727
Parser a m x

0 commit comments

Comments
 (0)