This repository was archived by the owner on Jan 9, 2026. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 116
Fix deltaparsing #1197
Draft
larskuhtz
wants to merge
10
commits into
master
Choose a base branch
from
lars/deltaparsing
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Fix deltaparsing #1197
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
bf2d455
fix delta parsing for attoparsec
larskuhtz 7266396
some linting
larskuhtz 3e93eba
one more test
larskuhtz 58ca210
remove redundant HasBytes instance
larskuhtz c72090b
Merge branch 'master' into lars/deltaparsing
larskuhtz 0d898ec
use legacy parsing
larskuhtz 9a55270
Merge branch 'master' into lars/deltaparsing
larskuhtz 6bd2cea
use legacy parser
larskuhtz a609679
Merge branch 'master' into lars/deltaparsing
larskuhtz a626f89
deltaparsing tweaks (#1203)
imalsogreg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,3 +1,6 @@ | ||||||
| {-# LANGUAGE BangPatterns #-} | ||||||
| {-# LANGUAGE CPP #-} | ||||||
| {-# LANGUAGE DerivingStrategies #-} | ||||||
| {-# LANGUAGE GeneralizedNewtypeDeriving #-} | ||||||
| {-# LANGUAGE Rank2Types #-} | ||||||
| -- | | ||||||
|
|
@@ -10,21 +13,37 @@ | |||||
| -- | ||||||
|
|
||||||
| module Pact.Types.Parser | ||||||
| ( | ||||||
| ( -- * Abstract Pact Parser | ||||||
| PactParser(..) | ||||||
| , symbols | ||||||
| , style | ||||||
|
|
||||||
| -- * Pact Attoparsec Parser | ||||||
| , PactAttoparsec(..) | ||||||
| , pactAttoParseOnly | ||||||
| ) | ||||||
| where | ||||||
|
|
||||||
|
|
||||||
| import Control.Applicative | ||||||
| import Control.Monad | ||||||
| import Text.Trifecta | ||||||
| import Control.Monad.State | ||||||
|
|
||||||
| import qualified Data.Attoparsec.Internal.Types as APT | ||||||
| import qualified Data.Attoparsec.Text as AP | ||||||
| import qualified Data.HashSet as HS | ||||||
| import qualified Data.Text as T | ||||||
|
|
||||||
| import Text.Parser.Token.Highlight | ||||||
| import Text.Parser.Token.Style | ||||||
| import Text.Trifecta | ||||||
| import Text.Trifecta.Delta as TF | ||||||
|
|
||||||
| -- -------------------------------------------------------------------------- -- | ||||||
| -- | Abstract Pact Parser | ||||||
| -- | ||||||
| -- On-chain this is use for Attoparsec as parser backend. In the repl trifecta | ||||||
| -- is used. | ||||||
| -- | ||||||
| newtype PactParser p a = PactParser { unPactParser :: p a } | ||||||
| deriving (Functor, Applicative, Alternative, Monad, MonadPlus, Parsing, CharParsing, DeltaParsing) | ||||||
|
|
||||||
|
|
@@ -45,3 +64,87 @@ style = IdentifierStyle "atom" | |||||
| (HS.fromList ["true","false"]) | ||||||
| Symbol | ||||||
| ReservedIdentifier | ||||||
|
|
||||||
| -- -------------------------------------------------------------------------- -- | ||||||
| -- Pact Attoparsec backend parser | ||||||
|
|
||||||
| -- | A wrapper around Attoparsec that adds DeltaParsing | ||||||
| -- | ||||||
| newtype PactAttoparsec a = PactAttoparsec | ||||||
| { runPactAttoparsec :: StateT Int AP.Parser a } | ||||||
| deriving (Functor, Applicative, Alternative, Monad, MonadPlus, MonadState Int) | ||||||
| deriving (Parsing, TokenParsing) | ||||||
|
|
||||||
| pactAttoParseOnly :: PactAttoparsec a -> T.Text -> Either String a | ||||||
| pactAttoParseOnly = AP.parseOnly . flip evalStateT 0 . runPactAttoparsec | ||||||
|
|
||||||
| -- | Atto DeltaParsing instance provides 'position' only (with no support for | ||||||
| -- hidden chars like Trifecta). | ||||||
| -- | ||||||
| instance DeltaParsing PactAttoparsec where | ||||||
| line = return mempty | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about something like this:
Suggested change
The error message is more informative, than giving an empty string.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. assuming it's actually not used ... |
||||||
|
|
||||||
| -- Returns @Column NUMBER_OF_CHARS NUMBER_OF_BYTES@ | ||||||
| -- | ||||||
| -- Notes about legacy behavior: | ||||||
| -- | ||||||
| -- For text <2 it used to return for both values the number of 16-bit code | ||||||
| -- units words. With the UTF-16 encoding used by text <2 this is the total | ||||||
| -- number of characters plus number of characters that are encoded using two | ||||||
| -- 16-bit code units. For instance the Ugaritic letter U+1038D would result | ||||||
| -- in a positional length of two (two 16-bit code units). The string | ||||||
| -- "a\0x263A\0x1038D" ("a☺𐎍") would have positional length 4 (3 characters | ||||||
| -- plus one 2 16-bit character). | ||||||
| -- | ||||||
| -- In practice the old behavior was close enough to the number of characters | ||||||
| -- that it went mostly unnoticed and didn't cause harm on chain. The code | ||||||
| -- just assumed that it represented the number text characters. Those | ||||||
| -- numbers appear on chain (up to some block height) within info objects and | ||||||
| -- later still in failure messages. It is also relevant for extracting the | ||||||
| -- module text from the pact transaction before storing it in the pact db. | ||||||
| -- The presence of unicode characters can result in modules containing | ||||||
| -- dangling data because there are less characters in the module than what | ||||||
| -- is assumed based on the position information. | ||||||
| -- | ||||||
| -- For text >=2 the attoparsic position tracks just bytes and the internal | ||||||
| -- representation of UTF-8. For instance the Ugaritic letter U+1038D results | ||||||
| -- in a byte length of 4. The string "a\0x263A\0x1038D" ("a☺𐎍") has 8 bytes | ||||||
| -- (1 code unit plus 3 code unit plus 4 code units). | ||||||
| -- | ||||||
| position = do | ||||||
| #if MIN_VERSION_text(2,0,0) | ||||||
| APT.Pos !bytePos <- parserPos | ||||||
| !charPos <- gets fromIntegral | ||||||
| #elif LEGACY_PARSER == 1 | ||||||
| APT.Pos !bytePos <- parserPos | ||||||
| let !charPos = fromIntegral bytePos | ||||||
| #else | ||||||
| APT.Pos !bytePos <- (* 2) <$> parserPos | ||||||
| !charPos <- gets fromIntegral | ||||||
| #endif | ||||||
| return $ TF.Columns charPos (fromIntegral bytePos) | ||||||
| {-# INLINE position #-} | ||||||
|
|
||||||
| slicedWith f a = (`f` mempty) <$> a | ||||||
| rend = return mempty | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar to |
||||||
| restOfLine = return mempty | ||||||
|
|
||||||
| -- | retrieve pos from Attoparsec. | ||||||
| -- | ||||||
| -- The first parameter to the parser is the remaining available input, which | ||||||
| -- isn't of any help here. | ||||||
| -- | ||||||
| parserPos :: PactAttoparsec APT.Pos | ||||||
| parserPos = PactAttoparsec $ StateT $ \x -> | ||||||
| APT.Parser $ \t !pos more _lose win -> win t pos more (pos, x) | ||||||
|
|
||||||
| instance CharParsing PactAttoparsec where | ||||||
| satisfy p = PactAttoparsec (satisfy p) <* modify' (+ 1) | ||||||
| {-# INLINE satisfy #-} | ||||||
|
|
||||||
| string s = PactAttoparsec (string s) <* modify' (+ length s) | ||||||
| {-# INLINE string #-} | ||||||
|
|
||||||
| text s = PactAttoparsec (text s) <* modify' (+ T.length s) | ||||||
| {-# INLINE text #-} | ||||||
|
|
||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple thoughts
You don't need to set
=1for binary flags, you can just define-DLEGACY_PARSER.Then CPP if-blocks become like
or
The other thought is that it could be behind a flag:
Which would make toggling it on/off not require editing the cabal file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this is all temporary for now. I tried to define the macro in
cabal.project.localbut cabal doesn't like that, so I had to put it into the cabal file.Regarding using 1/0, sometimes I like to be more explicit when dealing with macros (I find it easy to mess up with
#defineand#undef). Yeah, if that macro would still be in the final version of the code, we would use a standard pattern with (automatic) cabal flag. But, hopefully, that won't be necessary.