You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/_table_generation/table_generation.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,4 +66,6 @@ LITERAL_CAPTURE_INCREASE_LENGTH: True for t,r,u,e,f,a,l,s,n
66
66
GRAMMAR_CAPTURE_ERROR_FLAG
67
67
STRING_CAPTURE_ERROR_FLAG
68
68
NUMERIC_CAPTURE_ERROR_FLAG
69
-
LITERAL_CAPTURE_ERROR_FLAG
69
+
LITERAL_CAPTURE_ERROR_FLAG
70
+
71
+
PROCESS_RAW_TRANSCRIPT_TABLE: This table is used to post-process the raw transcript and add missing grammar tokens that were not captured during the initial scanning in build_transcript. Input: encoded_ascii of the last token in each entry (scan_mode + ascii character). Output: containing: token: The token type for this entry, new_grammar: Whether to add a missing grammar token, and scan_token: The type of grammar token to add (if needed), such as END_OBJECT_TOKEN }, or VALUE_SEPARATOR_TOKEN comma.
// while this assert is in an unconstrained function, the out of bounds accesss `raw_transcript[transcript_ptr]` in build_transcript also generates failing constraints
551
+
// while this assert is in an unconstrained function, the out of bounds access `raw_transcript[transcript_ptr]` in build_transcript also generates failing constraints
// If there is missing grammar, token will be LITERAL_TOKEN or NUMERIC_TOKEN, and new_grammar will be true, and scan_token will be a grammar token, such as END_OBJECT_TOKEN or VALUE_SEPARATOR_TOKEN
@@ -722,7 +737,8 @@ impl<let NumBytes: u32, let NumPackedFields: u32, let MaxNumTokens: u32, let Max
722
737
* @brief Check for missing tokens that we could have missed in `build_transcript`
723
738
* @details If we had a json string where a NUMERIC_TOKEN or LITERAL_TOKEN is directly succeeded by a VALUE_SEPARATOR_TOKEN, END_OBJECT_TOKEN, END_ARRAY_TOKEN,
724
739
* we will have missed the latter token.
725
-
* We pick these up via the lookup table PROCESS_RAW_TRANSCRIPT_TABLE
740
+
* We pick these up via the lookup table PROCESS_RAW_TRANSCRIPT_TABLE.
741
+
* The entries in self.raw_transcript currently look like false}, true], null, where the grammar tokens are counted as part of the token.
0 commit comments