Skip to content

Commit 9ae349c

Browse files
authored
Merge pull request Tencent#871 from StilesCrisis/token-by-token-doc
Token-by-token parser documentation
2 parents e6b192a + 0f3bf99 commit 9ae349c

File tree

1 file changed

+37
-14
lines changed

1 file changed

+37
-14
lines changed

doc/sax.md

Lines changed: 37 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ In RapidJSON, `Reader` (typedef of `GenericReader<...>`) is the SAX-style parser
88

99
# Reader {#Reader}
1010

11-
`Reader` parses a JSON from a stream. While it reads characters from the stream, it analyze the characters according to the syntax of JSON, and publish events to a handler.
11+
`Reader` parses a JSON from a stream. While it reads characters from the stream, it analyzes the characters according to the syntax of JSON, and publishes events to a handler.
1212

1313
For example, here is a JSON.
1414

@@ -24,7 +24,7 @@ For example, here is a JSON.
2424
}
2525
~~~~~~~~~~
2626

27-
While a `Reader` parses this JSON, it publishes the following events to the handler sequentially:
27+
When a `Reader` parses this JSON, it publishes the following events to the handler sequentially:
2828

2929
~~~~~~~~~~
3030
StartObject()
@@ -50,7 +50,7 @@ EndArray(4)
5050
EndObject(7)
5151
~~~~~~~~~~
5252

53-
These events can be easily matched with the JSON, except some event parameters need further explanation. Let's see the `simplereader` example which produces exactly the same output as above:
53+
These events can be easily matched with the JSON, but some event parameters need further explanation. Let's see the `simplereader` example which produces exactly the same output as above:
5454

5555
~~~~~~~~~~cpp
5656
#include "rapidjson/reader.h"
@@ -91,11 +91,11 @@ void main() {
9191
}
9292
~~~~~~~~~~
9393
94-
Note that, RapidJSON uses template to statically bind the `Reader` type and the handler type, instead of using class with virtual functions. This paradigm can improve the performance by inlining functions.
94+
Note that RapidJSON uses templates to statically bind the `Reader` type and the handler type, instead of using classes with virtual functions. This paradigm can improve performance by inlining functions.
9595
9696
## Handler {#Handler}
9797
98-
As the previous example showed, user needs to implement a handler, which consumes the events (function calls) from `Reader`. The handler must contain the following member functions.
98+
As shown in the previous example, the user needs to implement a handler which consumes the events (via function calls) from the `Reader`. The handler must contain the following member functions.
9999
100100
~~~~~~~~~~cpp
101101
class Handler {
@@ -122,15 +122,15 @@ class Handler {
122122

123123
When the `Reader` encounters a JSON number, it chooses a suitable C++ type mapping. And then it calls *one* function out of `Int(int)`, `Uint(unsigned)`, `Int64(int64_t)`, `Uint64(uint64_t)` and `Double(double)`. If `kParseNumbersAsStrings` is enabled, `Reader` will always calls `RawNumber()` instead.
124124

125-
`String(const char* str, SizeType length, bool copy)` is called when the `Reader` encounters a string. The first parameter is pointer to the string. The second parameter is the length of the string (excluding the null terminator). Note that RapidJSON supports null character `\0` inside a string. If such situation happens, `strlen(str) < length`. The last `copy` indicates whether the handler needs to make a copy of the string. For normal parsing, `copy = true`. Only when *insitu* parsing is used, `copy = false`. And beware that, the character type depends on the target encoding, which will be explained later.
125+
`String(const char* str, SizeType length, bool copy)` is called when the `Reader` encounters a string. The first parameter is pointer to the string. The second parameter is the length of the string (excluding the null terminator). Note that RapidJSON supports null character `\0` inside a string. If such situation happens, `strlen(str) < length`. The last `copy` indicates whether the handler needs to make a copy of the string. For normal parsing, `copy = true`. Only when *insitu* parsing is used, `copy = false`. And be aware that the character type depends on the target encoding, which will be explained later.
126126

127-
When the `Reader` encounters the beginning of an object, it calls `StartObject()`. An object in JSON is a set of name-value pairs. If the object contains members it first calls `Key()` for the name of member, and then calls functions depending on the type of the value. These calls of name-value pairs repeats until calling `EndObject(SizeType memberCount)`. Note that the `memberCount` parameter is just an aid for the handler, user may not need this parameter.
127+
When the `Reader` encounters the beginning of an object, it calls `StartObject()`. An object in JSON is a set of name-value pairs. If the object contains members it first calls `Key()` for the name of member, and then calls functions depending on the type of the value. These calls of name-value pairs repeat until calling `EndObject(SizeType memberCount)`. Note that the `memberCount` parameter is just an aid for the handler; users who do not need this parameter may ignore it.
128128

129-
Array is similar to object but simpler. At the beginning of an array, the `Reader` calls `BeginArary()`. If there is elements, it calls functions according to the types of element. Similarly, in the last call `EndArray(SizeType elementCount)`, the parameter `elementCount` is just an aid for the handler.
129+
Arrays are similar to objects, but simpler. At the beginning of an array, the `Reader` calls `BeginArary()`. If there is elements, it calls functions according to the types of element. Similarly, in the last call `EndArray(SizeType elementCount)`, the parameter `elementCount` is just an aid for the handler.
130130

131-
Every handler functions returns a `bool`. Normally it should returns `true`. If the handler encounters an error, it can return `false` to notify event publisher to stop further processing.
131+
Every handler function returns a `bool`. Normally it should return `true`. If the handler encounters an error, it can return `false` to notify the event publisher to stop further processing.
132132

133-
For example, when we parse a JSON with `Reader` and the handler detected that the JSON does not conform to the required schema, then the handler can return `false` and let the `Reader` stop further parsing. And the `Reader` will be in error state with error code `kParseErrorTermination`.
133+
For example, when we parse a JSON with `Reader` and the handler detects that the JSON does not conform to the required schema, the handler can return `false` and let the `Reader` stop further parsing. This will place the `Reader` in an error state, with error code `kParseErrorTermination`.
134134

135135
## GenericReader {#GenericReader}
136136

@@ -149,19 +149,19 @@ typedef GenericReader<UTF8<>, UTF8<> > Reader;
149149
} // namespace rapidjson
150150
~~~~~~~~~~
151151
152-
The `Reader` uses UTF-8 as both source and target encoding. The source encoding means the encoding in the JSON stream. The target encoding means the encoding of the `str` parameter in `String()` calls. For example, to parse a UTF-8 stream and outputs UTF-16 string events, you can define a reader by:
152+
The `Reader` uses UTF-8 as both source and target encoding. The source encoding means the encoding in the JSON stream. The target encoding means the encoding of the `str` parameter in `String()` calls. For example, to parse a UTF-8 stream and output UTF-16 string events, you can define a reader by:
153153
154154
~~~~~~~~~~cpp
155155
GenericReader<UTF8<>, UTF16<> > reader;
156156
~~~~~~~~~~
157157

158-
Note that, the default character type of `UTF16` is `wchar_t`. So this `reader`needs to call `String(const wchar_t*, SizeType, bool)` of the handler.
158+
Note that, the default character type of `UTF16` is `wchar_t`. So this `reader` needs to call `String(const wchar_t*, SizeType, bool)` of the handler.
159159

160160
The third template parameter `Allocator` is the allocator type for internal data structure (actually a stack).
161161

162162
## Parsing {#SaxParsing}
163163

164-
The one and only one function of `Reader` is to parse JSON.
164+
The main function of `Reader` is used to parse JSON.
165165

166166
~~~~~~~~~~cpp
167167
template <unsigned parseFlags, typename InputStream, typename Handler>
@@ -172,7 +172,30 @@ template <typename InputStream, typename Handler>
172172
bool Parse(InputStream& is, Handler& handler);
173173
~~~~~~~~~~
174174
175-
If an error occurs during parsing, it will return `false`. User can also calls `bool HasParseEror()`, `ParseErrorCode GetParseErrorCode()` and `size_t GetErrorOffset()` to obtain the error states. Actually `Document` uses these `Reader` functions to obtain parse errors. Please refer to [DOM](doc/dom.md) for details about parse error.
175+
If an error occurs during parsing, it will return `false`. User can also call `bool HasParseError()`, `ParseErrorCode GetParseErrorCode()` and `size_t GetErrorOffset()` to obtain the error states. In fact, `Document` uses these `Reader` functions to obtain parse errors. Please refer to [DOM](doc/dom.md) for details about parse errors.
176+
177+
## Token-by-Token Parsing {#TokenByTokenParsing}
178+
179+
Some users may wish to parse a JSON input stream a single token at a time, instead of immediately parsing an entire document without stopping. To parse JSON this way, instead of calling `Parse`, you can use the `IterativeParse` set of functions:
180+
181+
~~~~~~~~~~cpp
182+
void IterativeParseInit();
183+
184+
template <unsigned parseFlags, typename InputStream, typename Handler>
185+
bool IterativeParseNext(InputStream& is, Handler& handler);
186+
187+
bool IterativeParseComplete();
188+
~~~~~~~~~~
189+
190+
Here is an example of iteratively parsing JSON, token by token:
191+
192+
~~~~~~~~~~cpp
193+
reader.IterativeParseInit();
194+
while (!reader.IterativeParseComplete()) {
195+
reader.IterativeParseNext<kParseDefaultFlags>(is, handler);
196+
// Your handler has been called once.
197+
}
198+
~~~~~~~~~~
176199
177200
# Writer {#Writer}
178201

0 commit comments

Comments
 (0)