You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -441,7 +442,7 @@ impl<let NumBytes: u32, let NumPackedFields: u32, let MaxNumTokens: u32, let Max
441
442
// if we end in a scan mode where we're searching for a number, string or a literal (true/false/null), we have an incomplete token and this is invalid JSON
442
443
// NOTE: if we upgrade this parser to be able to process single-value JSON (e,g, "999" or ""hello" : "world"" this logic needs to be upgraded)
443
444
assert(
444
-
scan_mode == GRAMMAR_SCANasField,
445
+
scan_mode == GRAMMAR_CAPTUREasField,
445
446
"build_transcript: incomplete token (number, string or literal)",
446
447
);
447
448
@@ -455,28 +456,28 @@ impl<let NumBytes: u32, let NumPackedFields: u32, let MaxNumTokens: u32, let Max
455
456
* @details JSON_CAPTURE_TABLE takes the following as input:
456
457
* 1. the ascii byte at the current location in the json
457
458
* 2. the current scan mode (are we searching for grammar, strings, numbers or literals?)
458
-
* 3. could this byte potentially be an escape sequence? (i.e. the previous byte was a backslash character "\" and scan_mode == STRING_SCAN)
459
+
* 3. could this byte potentially be an escape sequence? (i.e. the previous byte was a backslash character "\" and scan_mode == STRING_CAPTURE)
459
460
* The table outputs the following flags:
460
461
* 1. what token have we scanned? (listed in enums::Token)
461
462
* 2. should we push this token to the transcript (no push if token == NO_TOKEN)
462
463
* 3. should we increase the length of the current entry we're evaluating?
463
-
* (i.e. if token == STRING_TOKEN and scan_mode == STRING_SCAN, then increase the length because we're in the process of scanning a string)
464
-
* 4. is this scanned ascii character a potential escape sequence? i.e. scan_mode == STRING_SCAN and ascii = "\"
464
+
* (i.e. if token == STRING_TOKEN and scan_mode == STRING_CAPTURE, then increase the length because we're in the process of scanning a string)
465
+
* 4. is this scanned ascii character a potential escape sequence? i.e. scan_mode == STRING_CAPTURE and ascii = "\"
465
466
* 5. have we entered an error state? (i.e. invalid grammar e.g. ":" is followed by "}")
466
467
*
467
468
* NOTE: we represent error states in a nonstandard way to reduce gate count. Instead of handling an error flag,
468
469
* an error state will increase the value of `scan_token` by 0x100000000. This will cause the next access into `JSON_CAPTURE_TABLE` to trigger an out of bounds error
469
470
*
470
471
* NOTE: the scanned transcript will be missing some edge cases that are caught via `swap_keys` and `capture_missing_tokens`:
471
-
* 1. If the scan mode is NUMERIC_SCAN or LITERAL_SCAN and the next character is a "," or "}" or "]",
472
+
* 1. If the scan mode is NUMERIC_CAPTURE or LITERAL_CAPTURE and the next character is a "," or "}" or "]",
472
473
* we will push a NUMERIC_TOKEN or LITERAL_TOKEN into the transcript but we will MISS the VALUE_SEPARATOR_TOKEN, END_OBJECT_TOKEN or END_ARRAY_TOKEN
473
474
* (accomodating this edge case requires conditionally pushing two transcript entries per iteration, so we do this in a separate step where we iterate over the transcript and not the json bytes)
474
475
* 2. We can't yet tell if an entry is a KEY_TOKEN or a STRING_TOKEN. All keys are represented as STRING_TOKEN. This gets fixed after `swap_keys` is evaluated
0 commit comments