Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions core/src/DocTestDataParserK.hs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@

>>> import qualified Streamly.Data.Parser as Parser
>>> import qualified Streamly.Data.ParserK as ParserK
>>> import qualified Streamly.Data.Stream as Stream
>>> import qualified Streamly.Data.StreamK as StreamK
>>> import qualified Streamly.Unicode.Parser as Parser

For APIs that have not been released yet.

Expand Down
5 changes: 5 additions & 0 deletions core/src/Streamly/Data/Array.hs
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,11 @@ module Streamly.Data.Array

-- * Stream of Arrays
, chunksOf
, parserK
, parse
, parseBreak
, parsePos
, parseBreakPos

-- * Casting
, cast
Expand Down
5 changes: 5 additions & 0 deletions core/src/Streamly/Data/Array/Generic.hs
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,11 @@ module Streamly.Data.Array.Generic

-- * Stream of Arrays
, chunksOf
, parserK
, parse
, parseBreak
, parsePos
, parseBreakPos

-- * Random Access
, length
Expand Down
99 changes: 69 additions & 30 deletions core/src/Streamly/Data/Parser.hs
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,23 @@
-- Stability : pre-release
-- Portability : GHC
--
-- Parsers are more powerful 'Streamly.Data.Fold.Fold's:
-- Parsers are more powerful but less general than 'Streamly.Data.Fold.Fold's:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less general?

--
-- * folds cannot fail but parsers can fail and backtrack.
-- * folds can be composed as a Tee but parsers cannot.
-- * folds can be used for scanning but parsers cannot.
-- * folds can be converted to parsers.
--
-- Streamly parsers support all operations offered by popular Haskell parser
-- libraries. They operate on a generic input type, support streaming, and are
-- faster.
-- libraries. Unlike other parser libraries, (1) streamly parsers can operate
-- on any Haskell type as input - not just bytes, (2) natively support
-- streaming, (3) and are faster.
--
-- Like folds, parsers use stream fusion, compiling to efficient low-level code
-- comparable to the speed of C. Parsers are suitable for high-performance
-- parsing of streams.
-- == High Performance by Static Parser Fusion
--
-- Like folds, parsers are designed to utilize stream fusion, compiling to
-- efficient low-level code comparable to the speed of C. Parsers are suitable
-- for high-performance parsing of streams.
--
-- Operations in this module are designed to be composed statically rather than
-- dynamically. They are inlined to enable static fusion. More importantly,
Expand All @@ -30,31 +33,36 @@
-- from the "Streamly.Data.ParserK" module. 'Parser' and
-- 'Streamly.Data.ParserK.ParserK' types are interconvertible.
--
-- == Using Parsers
-- == How to parse a stream?
--
-- Parser combinators can be used to create a pipeline of folds or parsers such
-- that the next fold or parser consumes the result of the previous parser.
-- Such a composed pipeline of parsers can then be driven by one of many parser
-- drivers available in the Stream and Array modules.
--
-- This module provides elementary parsers and parser combinators that can be
-- used to parse a stream of data. Additionally, all the folds from the
-- "Streamly.Data.Fold" module can be converted to parsers using 'fromFold'.
-- All the parsing functionality provided by popular parsing libraries, and
-- more is available. Also see "Streamly.Unicode.Parser" module for Char stream
-- parsers.
-- Use Streamly.Data.Stream.'Streamly.Data.Stream.parse' or
-- Streamly.Data.Stream.'Streamly.Data.Stream.parseBreak' to run a parser on an
-- input stream and return the parsed result.
--
-- A data stream can be transformed to a stream of parsed data elements. Parser
-- combinators can be used to create a pipeline of folds or parsers such that
-- the next fold or parser consumes the result of the previous parser. See
-- 'Streamly.Data.Stream.parse' and 'Streamly.Data.Stream.parseMany' to run
-- these parsers on a stream.
-- Use Streamly.Data.Stream.'Streamly.Data.Stream.parseMany' or
-- Streamly.Data.Stream.'Streamly.Data.Stream.parseIterate' to transform an
-- input data stream to an output stream of parsed data elements using a
-- parser.
--
-- == Parser vs ParserK
--
-- There are two functionally equivalent parsing modules,
-- "Streamly.Data.Parser" (this module) and "Streamly.Data.ParserK". The latter
-- is a CPS based wrapper over the former, and can be used for parsing in
-- general. "Streamly.Data.Parser" enables stream fusion and should be
-- general. "Streamly.Data.Parser" enables stream fusion and where possible it should be
-- preferred over "Streamly.Data.ParserK" for high performance stream parsing
-- use cases. However, there are a few cases where this module is not
-- suitable and ParserK should be used instead.
--
-- As a thumb rule, when recursion or heavy nesting is needed use ParserK.
--
-- === Parser: non-recursive static fusion
--
-- For static fusion, parser combinators have to use strict pattern matching on
-- arguments of type Parser. This leads to infinte loop when a parser is
-- defined recursively, due to strict evaluation of the recursive call. For
Expand All @@ -74,11 +82,13 @@
-- >>> p = p1 <|> p2
-- >>> :}
--
-- === ParserK: recursive application
--
-- Use ParserK when recursive use is required:
--
-- >>> import Streamly.Data.ParserK (ParserK, parserK)
-- >>> import Streamly.Data.ParserK (ParserK)
-- >>> import Streamly.Data.StreamK (parserK)
-- >>> import qualified Streamly.Data.StreamK as StreamK
-- >>> import qualified Streamly.Internal.Data.StreamK as StreamK (parse)
--
-- >>> :{
-- >>> p, p1, p2 :: Monad m => ParserK Char m String
Expand All @@ -105,25 +115,54 @@
-- combined @n@ times, roughly 8 or less sequenced parsers are fine. READ THE
-- DOCS OF APPLICATIVE, MONAD AND ALTERNATIVE INSTANCES.
--
-- == Streaming Parsers
-- == Parsers Galore!
--
-- Streamly provides all the parsing functionality provided by popular parsing
-- libraries, and much more with higher performance.
-- This module provides most of the elementary parsers and parser combinators.
-- Additionally,
--
-- With 'Streamly.Data.ParserK.ParserK' you can use the generic Alternative
-- type class based parsers from the
-- * all the folds from the "Streamly.Data.Fold" module can be converted to
-- parsers using 'fromFold'.
-- * "Streamly.Unicode.Parser" module provides Char stream parsers.
-- * all the combinators from the
-- <https://hackage.haskell.org/package/parser-combinators parser-combinators>
-- library or similar. However, we recommend that you use the equivalent
-- functionality from this module for better performance and for streaming
-- behavior.
-- package can be used with streamly ParserK.
-- * See "Streamly.Internal.Data.Parser" for many more unreleased but useful APIs.
--
-- == Generic Parser Combinators
--
-- With 'Streamly.Data.ParserK.ParserK' you can use the 'Applicative' and
-- 'Control.Applicative.Alternative' type class based generic parser
-- combinators from the
-- <https://hackage.haskell.org/package/parser-combinators parser-combinators>
-- library or similar. However, if available, we recommend that you use the
-- equivalent functionality from this module where performance and streaming
-- behavior matters.
--
-- Firstly, the combinators in this module are faster due to stream fusion.
-- Secondly, these are streaming in nature as the results can be passed
-- directly to other stream consumers (folds or parsers). The Alternative type
-- class based parsers would end up buffering all the results in lists before
-- they can be consumed.
--
-- When recursion or heavy nesting is needed use ParserK.
--
-- == Error Reporting
--
-- There are two types of parser drivers available, @parse@ and @parseBreak@
-- drivers do not track stream position, whereas @parsePos@ and @parseBreakPos@
-- drivers track stream position information with slightly more performance
-- overhead.
--
-- When an error occurs the stream position is reported, in case byte streams
-- or unboxed array streams this is the byte position, in case of generic
-- element parsers or generic array parsers this is the element position in the
-- stream.
--
-- If you need line number or column information you can read the stream again
-- (if it is immutable) and translate the reported byte position to line number
-- and column. More elaborate support for computing arbitrary and custom error
-- context information is planned to be added in future.
--
-- These parsers do not report the error context (e.g. line number or column).
-- This may be supported in future.
--
Expand All @@ -148,7 +187,7 @@ module Streamly.Data.Parser

-- * Parser Type
Parser
, ParseError
, ParseError(..)

-- -- * Downgrade to Fold
-- , toFold
Expand Down
109 changes: 58 additions & 51 deletions core/src/Streamly/Data/ParserK.hs
Original file line number Diff line number Diff line change
Expand Up @@ -8,78 +8,88 @@
-- Portability : GHC
--
-- See the general notes about parsing in the "Streamly.Data.Parser" module.
-- This module implements a using Continuation Passing Style (CPS) wrapper over
-- the "Streamly.Data.Parser" module. It is as fast or faster than attoparsec.
--
-- Streamly parsers support all operations offered by popular Haskell parser
-- libraries. They operate on a generic input type, support streaming, and are
-- faster.
-- This (ParserK) module implements a Continuation Passing Style (CPS) wrapper
-- over the fused "Streamly.Data.Parser" module. It is a faster CPS parser than
-- attoparsec.
--
-- The 'ParserK' type represents a stream-consumer as a composition of function
-- calls, therefore, a function call overhead is incurred at each composition.
-- It is reasonably fast in general but may be a few times slower than a fused
-- parser represented by the 'Streamly.Data.Parser.Parser' type. However, it
-- It is reasonably fast in general but may be a few times slower than the
-- fused 'Streamly.Data.Parser.Parser' type. However, unlike fused parsers, it
-- allows for scalable dynamic composition, especially, 'ParserK' can be used
-- in recursive calls. Operations like 'splitWith' on 'ParserK' type have
-- linear (O(n)) performance with respect to the number of compositions.
--
-- 'ParserK' is preferred over 'Streamly.Data.Parser.Parser' when extensive
-- applicative, alternative and monadic composition is required, or when
-- recursive or dynamic composition of parsers is required. 'ParserK' also
-- allows efficient parsing of a stream of arrays, it can also break the input
-- stream into a parse result and remaining stream so that the stream can be
-- parsed independently in segments.
--
-- == Using ParserK
--
-- All the parsers from the "Streamly.Data.Parser" module can be converted to
-- ParserK using the 'Streamly.Data.Array.parserK',
-- 'Streamly.Internal.Data.ParserK.parserK', and
-- 'Streamly.Internal.Data.Array.Generic.parserK' combinators.
--
-- 'Streamly.Data.Array.parse' runs a parser on a stream of unboxed
-- arrays, this is the preferred and most efficient way to parse chunked input.
-- The more general 'Streamly.Data.Array.parseBreak' function returns
-- the remaining stream as well along with the parse result. There are
-- 'Streamly.Internal.Data.Array.Generic.parse',
-- 'Streamly.Internal.Data.Array.Generic.parseBreak' as well to run
-- parsers on boxed arrays. 'Streamly.Internal.Data.StreamK.parse',
-- 'Streamly.Internal.Data.StreamK.parseBreak' run parsers on a stream of
-- individual elements instead of stream of arrays.
-- 'ParserK' is preferred over the fused 'Streamly.Data.Parser.Parser' when
-- extensive applicative, alternative and monadic composition is required, or
-- when recursive or dynamic composition of parsers is required. 'ParserK' also
-- allows efficient parsing of a stream of byte arrays, it can also break the
-- input stream into a parse result and the remaining stream so that the stream
-- can be parsed independently in segments.
--
-- == Monadic Composition
-- == How to parse a stream?
--
-- Monad composition can be used for lookbehind parsers, we can dynamically
-- compose new parsers based on the results of the previously parsed values.
-- All the fused parsers from the "Streamly.Data.Parser" module can be
-- converted to the CPS ParserK, for use with different types of parser
-- drivers, using
-- the @parserK@ combinators - Streamly.Data.Array.'Streamly.Data.Array.parserK',
-- Streamly.Data.StreamK.'Streamly.Data.StreamK.parserK', and
-- Streamly.Data.Array.Generic.'Streamly.Data.Array.Generic.parserK'.
--
-- To parse a stream of unboxed arrays, use
-- Streamly.Data.Array.'Streamly.Data.Array.parse' for running the parser, this
-- is the preferred and most efficient way to parse chunked input. The
-- Streamly.Data.Array.'Streamly.Data.Array.parseBreak' function returns the
-- remaining stream as well along with the parse result.
--
-- To parse a stream of boxed arrays, use
-- Streamly.Data.Array.Generic.'Streamly.Data.Array.Generic.parse' or
-- Streamly.Data.Array.Generic.'Streamly.Data.Array.Generic.parseBreak' to run
-- the parser.
--
-- To parse a stream of individual elements, use
-- Streamly.Data.StreamK.'Streamly.Data.StreamK.parse' and
-- Streamly.Data.StreamK.'Streamly.Data.StreamK.parseBreak' to run the parser.
--
-- If we have to parse "a9" or "9a" but not "99" or "aa" we can use the
-- following non-monadic, backtracking parser:
-- == Applicative Composition
--
-- >>> digits p1 p2 = ((:) <$> p1 <*> ((:) <$> p2 <*> pure []))
-- Applicative parsers are simpler but we cannot use lookbehind as we can in
-- the monadic parsers.
--
-- If we have to parse "9a" or "a9" but not "99" or "aa" we can use the
-- following non-monadic (Applicative), backtracking parser:
--
-- >>> -- parse p1 : p2 : []
-- >>> token p1 p2 = ((:) <$> p1 <*> ((:) <$> p2 <*> pure []))
-- >>> :{
-- backtracking :: Monad m => ParserK Char m String
-- backtracking = ParserK.parserK $
-- digits (Parser.satisfy isDigit) (Parser.satisfy isAlpha)
-- backtracking = StreamK.parserK $
-- token (Parser.satisfy isDigit) (Parser.satisfy isAlpha) -- e.g. "9a"
-- <|>
-- digits (Parser.satisfy isAlpha) (Parser.satisfy isDigit)
-- token (Parser.satisfy isAlpha) (Parser.satisfy isDigit) -- e.g. "a9"
-- :}
--
-- We know that if the first parse resulted in a digit at the first place then
-- the second parse is going to fail. However, we waste that information and
-- parse the first character again in the second parse only to know that it is
-- not an alphabetic char. By using lookbehind in a 'Monad' composition we can
-- avoid redundant work:
-- == Monadic Composition
--
-- Monad composition can be used to implement lookbehind parsers, we can dynamically
-- compose new parsers based on the results of the previously parsed values.
--
-- In the previous example, we know that if the first parse resulted in a digit
-- at the first place then the second parse is going to fail. However, we
-- waste that information and parse the first character again in the second
-- parse only to know that it is not an alphabetic char. By using lookbehind
-- in a 'Monad' composition we can avoid redundant work:
--
-- >>> data DigitOrAlpha = Digit Char | Alpha Char
--
-- >>> :{
-- lookbehind :: Monad m => ParserK Char m String
-- lookbehind = do
-- x1 <- ParserK.parserK $
-- x1 <- StreamK.parserK $
-- Digit <$> Parser.satisfy isDigit
-- <|> Alpha <$> Parser.satisfy isAlpha
-- -- Note: the parse depends on what we parsed already
-- x2 <- ParserK.parserK $
-- x2 <- StreamK.parserK $
-- case x1 of
-- Digit _ -> Parser.satisfy isAlpha
-- Alpha _ -> Parser.satisfy isDigit
Expand All @@ -105,11 +115,8 @@ module Streamly.Data.ParserK
ParserK

-- * Parsers
-- ** Conversions
, parserK
-- , toParser

-- ** Without Input
-- -- ** Without Input
, fromPure
, fromEffect
, die
Expand Down
4 changes: 3 additions & 1 deletion core/src/Streamly/Data/Stream.hs
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,9 @@ module Streamly.Data.Stream

-- ** Parsing
, parse
-- , parseBreak
, parseBreak
, parsePos
, parseBreakPos

-- ** Lazy Right Folds
-- | Consuming a stream to build a right associated expression, suitable
Expand Down
3 changes: 3 additions & 0 deletions core/src/Streamly/Data/StreamK.hs
Original file line number Diff line number Diff line change
Expand Up @@ -125,8 +125,11 @@ module Streamly.Data.StreamK
-- , foldBreak

-- ** Parsing
, parserK
, parse
, parseBreak
, parsePos
, parseBreakPos

-- * Transformation
, mapM
Expand Down
4 changes: 4 additions & 0 deletions core/src/Streamly/Internal/Data/Parser.hs
Original file line number Diff line number Diff line change
Expand Up @@ -2595,6 +2595,8 @@ data DeintercalateAllState fs sp ss =
-- >>> Stream.parse p $ Stream.fromList "1+2+3"
-- Right [Left "1",Right '+',Left "2",Right '+',Left "3"]
--
-- See also 'Streamly.Internal.Data.ParserK.chainl1'.
--
{-# INLINE deintercalateAll #-}
deintercalateAll :: Monad m =>
Parser a m x
Expand Down Expand Up @@ -2718,6 +2720,8 @@ data DeintercalateState b fs sp ss =
-- >>> Stream.parse p $ Stream.fromList "1+2+3"
-- Right [Left "1",Right '+',Left "2",Right '+',Left "3"]
--
-- See also 'Streamly.Internal.Data.ParserK.chainl1'.
--
{-# INLINE deintercalate #-}
deintercalate :: Monad m =>
Parser a m x
Expand Down
Loading
Loading