Skip to content

Commit ed547aa

Browse files
committed
v0.4.0
1 parent f1b0dc7 commit ed547aa

File tree

6 files changed

+233
-233
lines changed

6 files changed

+233
-233
lines changed

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ project(ryml
66
LANGUAGES CXX)
77
include(./compat.cmake)
88

9-
c4_project(VERSION 0.3.0 STANDALONE
9+
c4_project(VERSION 0.4.0 STANDALONE
1010
AUTHOR "Joao Paulo Magalhaes <dev@jpmag.me>")
1111

1212

changelog/0.4.0.md

Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
This release improves compliance with the [YAML test suite](https://github.com/yaml/yaml-test-suite/) (thanks @ingydotnet and @perlpunk for extensive and helpful cooperation), and adds node location tracking using the parser.
2+
3+
4+
### Breaking changes
5+
6+
As part of the [new feature to track source locations](https://github.com/biojppm/rapidyaml/pull/168), opportunity was taken to address a number of pre-existing API issues. These changes consisted of:
7+
8+
- Deprecate `c4::yml::parse()` and `c4::yml::Parser::parse()` overloads; all these functions will be removed in short order. Until removal, any call from client code will trigger a compiler warning.
9+
- Add `parse()` alternatives, either `parse_in_place()` or `parse_in_arena()`:
10+
- `parse_in_place()` receives only `substr` buffers, ie mutable YAML source buffers. Trying to pass a `csubstr` buffer to `parse_in_place()` will cause a compile error:
11+
```c++
12+
substr readwrite = /*...*/;
13+
Tree tree = parse_in_place(readwrite); // OK
14+
15+
csubstr readonly = /*...*/;
16+
Tree tree = parse_in_place(readonly); // compile error
17+
```
18+
- `parse_in_arena()` receives only `csubstr` buffers, ie immutable YAML source buffers. Prior to parsing, the buffer is copied to the tree's arena, then the copy is parsed in place. Because `parse_in_arena()` is meant for immutable buffers, overloads receiving a `substr` YAML buffer are now declared but marked deprecated, and intentionally left undefined, such that calling `parse_in_arena()` with a `substr` will cause a linker error as well as a compiler warning.
19+
```c++
20+
substr readwrite = /*...*/;
21+
Tree tree = parse_in_arena(readwrite); // compile warning+linker error
22+
```
23+
This is to prevent an accidental extra copy of the mutable source buffer to the tree's arena: `substr` is implicitly convertible to `csubstr`. If you really intend to parse an originally mutable buffer in the tree's arena, convert it first explicitly to immutable by assigning the `substr` to a `csubstr` prior to calling `parse_in_arena()`:
24+
```c++
25+
substr readwrite = /*...*/;
26+
csubstr readonly = readwrite; // ok
27+
Tree tree = parse_in_arena(readonly); // ok
28+
```
29+
This problem does not occur with `parse_in_place()` because `csubstr` is not implicitly convertible to `substr`.
30+
- In the python API, `ryml.parse()` was removed and not just deprecated; the `parse_in_arena()` and `parse_in_place()` now replace this.
31+
- `Callbacks`: changed behavior in `Parser` and `Tree`:
32+
- When a tree is copy-constructed or move-constructed to another, the receiving tree will start with the callbacks of the original.
33+
- When a tree is copy-assigned or move-assigned to another, the receiving tree will now change its callbacks to the original.
34+
- When a parser creates a new tree, the tree will now use a copy of the parser's callbacks object.
35+
- When an existing tree is given directly to the parser, both the tree and the parser now retain their own callback objects; any allocation or error during parsing will go through the respective callback object.
36+
37+
38+
### New features
39+
40+
- Add tracking of source code locations. This is useful for reporting semantic errors after the parsing phase (ie where the YAML is syntatically valid and parsing is successful, but the tree contents are semantically invalid). The locations can be obtained lazily from the parser when the first location is queried:
41+
```c++
42+
// To obtain locations, use of the parser is needed:
43+
ryml::Parser parser;
44+
ryml::Tree tree = parser.parse_in_arena("source.yml", R"({
45+
aa: contents,
46+
foo: [one, [two, three]]
47+
})");
48+
// After parsing, on the first call to obtain a location,
49+
// the parser will cache a lookup structure to accelerate
50+
// tracking the location of a node, with complexity
51+
// O(numchars(srcbuffer)). Then it will do the lookup, with
52+
// complexity O(log(numlines(srcbuffer))).
53+
ryml::Location loc = parser.location(tree.rootref());
54+
assert(parser.location_contents(loc).begins_with("{"));
55+
// note the location members are zero-based:
56+
assert(loc.offset == 0u);
57+
assert(loc.line == 0u);
58+
assert(loc.col == 0u);
59+
// On the next call to location(), the accelerator is reused
60+
// and only the lookup is done.
61+
loc = parser.location(tree["aa"]);
62+
assert(parser.location_contents(loc).begins_with("aa"));
63+
assert(loc.offset == 2u);
64+
assert(loc.line == 1u);
65+
assert(loc.col == 0u);
66+
// KEYSEQ in flow style: points at the key
67+
loc = parser.location(tree["foo"]);
68+
assert(parser.location_contents(loc).begins_with("foo"));
69+
assert(loc.offset == 16u);
70+
assert(loc.line == 2u);
71+
assert(loc.col == 0u);
72+
loc = parser.location(tree["foo"][0]);
73+
assert(parser.location_contents(loc).begins_with("one"));
74+
assert(loc.line == 2u);
75+
assert(loc.col == 6u);
76+
// SEQ in flow style: location points at the opening '[' (there's no key)
77+
loc = parser.location(tree["foo"][1]);
78+
assert(parser.location_contents(loc).begins_with("["));
79+
assert(loc.line == 2u);
80+
assert(loc.col == 11u);
81+
loc = parser.location(tree["foo"][1][0]);
82+
assert(parser.location_contents(loc).begins_with("two"));
83+
assert(loc.line == 2u);
84+
assert(loc.col == 12u);
85+
loc = parser.location(tree["foo"][1][1]);
86+
assert(parser.location_contents(loc).begins_with("three"));
87+
assert(loc.line == 2u);
88+
assert(loc.col == 17u);
89+
// NOTE: reusing the parser with a new YAML source buffer
90+
// will invalidate the accelerator.
91+
```
92+
See more details in the [quickstart sample](https://github.com/biojppm/rapidyaml/blob/bfb073265abf8c58bbeeeed7fb43270e9205c71c/samples/quickstart.cpp#L3759). Thanks to @cschreib for submitting a working example proving how simple it could be to achieve this.
93+
- `Parser`:
94+
- add `source()` and `filename()` to get the latest buffer and filename to be parsed
95+
- add `callbacks()` to get the parser's callbacks
96+
- Add `from_tag_long()` and `normalize_tag_long()`:
97+
```c++
98+
assert(from_tag_long(TAG_MAP) == "<tag:yaml.org,2002:map>");
99+
assert(normalize_tag_long("!!map") == "<tag:yaml.org,2002:map>");
100+
```
101+
- Add an experimental API to resolve tags based on the tree's tag directives. This API is still imature and will likely be subject to changes, so we won't document it yet.
102+
- Regarding emit styles (see issue [#37](https://github.com/biojppm/rapidyaml/issues/37)): add an experimental API to force flow/block style on container nodes, as well as block-literal/block-folded/double-quoted/single-quoted/plain styles on scalar nodes. This API is also immature and will likely be subject to changes, so we won't document it yet. But if you are desperate for this functionality, the new facilities will let you go further.
103+
- Add preliminary support for bare-metal ARM architectures, with CI tests pending implementation of QEMU action. ([#193](https://github.com/biojppm/rapidyaml/issues/193), [c4core#63](https://github.com/biojppm/c4core/issues/63)).
104+
- Add preliminary support for RISC-V architectures, with CI tests pending availability of RISC-V based github actions. ([c4core#69](https://github.com/biojppm/c4core/pulls/69)).
105+
106+
107+
### Fixes
108+
109+
- Fix edge cases of parsing of explicit keys (ie keys after `?`) ([PR#212](https://github.com/biojppm/rapidyaml/pulls/212)):
110+
```yaml
111+
# all these were fixed:
112+
? : # empty
113+
? explicit key # this comment was not parsed correctly
114+
? # trailing empty key was not added to the map
115+
```
116+
- Fixed parsing of tabs used as whitespace tokens after `:` or `-`. This feature [is costly (see some benchmark results here)](https://github.com/biojppm/rapidyaml/pull/211#issuecomment-1030688035) and thus it is disabled by default, and requires defining a macro or cmake option `RYML_WITH_TAB_TOKENS` to enable ([PR#211](https://github.com/biojppm/rapidyaml/pulls/211)).
117+
- Allow tab indentation in flow seqs ([PR#215](https://github.com/biojppm/rapidyaml/pulls/215)) (6CA3).
118+
- ryml now parses successfully compact JSON code `{"like":"this"}` without any need for preprocessing. This code was not valid YAML 1.1, but was made valid in YAML 1.2. So the `preprocess_json()` functions, used to insert spaces after `:` are no longer necessary and have been removed. If you were using these functions, remove the calls and just pass the original source directly to ryml's parser ([PR#210](https://github.com/biojppm/rapidyaml/pulls/210)).
119+
- Fix handling of indentation when parsing block scalars ([PR#210](https://github.com/biojppm/rapidyaml/pulls/210)):
120+
```yaml
121+
---
122+
|
123+
hello
124+
there
125+
---
126+
|
127+
ciao
128+
qua
129+
---
130+
- |
131+
hello
132+
there
133+
- |
134+
ciao
135+
qua
136+
---
137+
foo: |
138+
hello
139+
there
140+
bar: |
141+
ciao
142+
qua
143+
```
144+
- Fix parsing of maps when opening a scope with whitespace before the colon ([PR#210](https://github.com/biojppm/rapidyaml/pulls/210)):
145+
```yaml
146+
foo0 : bar
147+
---
148+
foo1 : bar # the " :" was causing an assert
149+
---
150+
foo2 : bar
151+
---
152+
foo3 : bar
153+
---
154+
foo4 : bar
155+
```
156+
- Ensure container keys preserve quote flags when the key is quoted ([PR#210](https://github.com/biojppm/rapidyaml/pulls/210)).
157+
- Ensure scalars beginning with `%` are emitted with quotes (([PR#216](https://github.com/biojppm/rapidyaml/pulls/216)).
158+
- Fix [#203](https://github.com/biojppm/rapidyaml/issues/203): when parsing, do not convert `null` or `~` to null scalar strings. Now the scalar strings contain the verbatim contents of the original scalar; to query whether a scalar value is null, use `Tree::key_is_null()/val_is_null()` and `NodeRef::key_is_null()/val_is_null()` which return true if it is empty or any of the unquoted strings `~`, `null`, `Null`, or `NULL`. ([PR#207](https://github.com/biojppm/rapidyaml/pulls/207)):
159+
- Fix [#205](https://github.com/biojppm/rapidyaml/issues/205): fix parsing of escaped characters in double-quoted strings: `"\\\"\n\r\t\<TAB>\/\<SPC>\0\b\f\a\v\e\_\N\L\P"` ([PR#207](https://github.com/biojppm/rapidyaml/pulls/207)).
160+
- Fix [#204](https://github.com/biojppm/rapidyaml/issues/204): add decoding of unicode codepoints `\x` `\u` `\U` in double-quoted scalars:
161+
```c++
162+
Tree tree = parse_in_arena(R"(["\u263A \xE2\x98\xBA \u2705 \U0001D11E"])");
163+
assert(tree[0].val() == "☺ ☺ ✅ 𝄞");
164+
```
165+
This is mandated by the YAML standard and was missing from ryml ([PR#207](https://github.com/biojppm/rapidyaml/pulls/207)).
166+
- Fix emission of nested nodes which are sequences: when these are given as the emit root, the `- ` from the parent node was added ([PR#210](https://github.com/biojppm/rapidyaml/pulls/210)):
167+
```c++
168+
const ryml::Tree tree = ryml::parse_in_arena(R"(
169+
- - Rochefort 10
170+
- Busch
171+
- Leffe Rituel
172+
- - and so
173+
- many other
174+
- wonderful beers
175+
)");
176+
// before (error), YAML valid but not expected
177+
//assert(ryml::emitrs<std::string>(tree[0][3]) == R"(- - and so
178+
// - many other
179+
// - wonderful beers
180+
//)");
181+
// now: YAML valid and expected
182+
assert(ryml::emitrs<std::string>(tree[0][3]) == R"(- and so
183+
- many other
184+
- wonderful beers
185+
)");
186+
```
187+
- Fix parsing of isolated `!`: should be an empty val tagged with `!` (UKK06-02) ([PR#215](https://github.com/biojppm/rapidyaml/pulls/215)).
188+
- Fix [#193](https://github.com/biojppm/rapidyaml/issues/193): amalgamated header missing `#include <stdarg.h>` which prevented compilation in bare-metal `arm-none-eabi` ([PR #195](https://github.com/biojppm/rapidyaml/pull/195), requiring also [c4core #64](https://github.com/biojppm/c4core/pull/64)).
189+
- Accept `infinity`,`inf` and `nan` as special float values (but not mixed case: eg `InFiNiTy` or `Inf` or `NaN` are not accepted) ([PR #186](https://github.com/biojppm/rapidyaml/pull/186)).
190+
- Accept special float values with upper or mixed case: `.Inf`, `.INF`, `.NaN`, `.NAN`. Previously, only low-case `.inf` and `.nan` were accepted ([PR #186](https://github.com/biojppm/rapidyaml/pull/186)).
191+
- Accept `null` with upper or mixed case: `Null` or `NULL`. Previously, only low-case `null` was accepted ([PR #186](https://github.com/biojppm/rapidyaml/pull/186)).
192+
- Fix [#182](https://github.com/biojppm/rapidyaml/issues/182): add missing export of DLL symbols, and document requirements for compiling shared library from the amalgamated header. [PR #183](https://github.com/biojppm/rapidyaml/pull/183), also [PR c4core#56](https://github.com/biojppm/c4core/pull/56) and [PR c4core#57](https://github.com/biojppm/c4core/pull/57).
193+
- Fix [#185](https://github.com/biojppm/rapidyaml/issues/185): compilation failures in earlier Xcode versions ([PR #187](https://github.com/biojppm/rapidyaml/pull/187) and [PR c4core#61](https://github.com/biojppm/c4core/pull/61)):
194+
- `c4/substr_fwd.hpp`: (failure in Xcode 12 and earlier) forward declaration for `std::allocator` is inside the `inline namespace __1`, unlike later versions.
195+
- `c4/error.hpp`: (failure in debug mode in Xcode 11 and earlier) `__clang_major__` does not mean the same as in the common clang, and as a result the warning `-Wgnu-inline-cpp-without-extern` does not exist there.
196+
- Ensure error messages do not wrap around the buffer when the YAML source line is too long ([PR#210](https://github.com/biojppm/rapidyaml/pulls/210)).
197+
- Ensure error is emitted on unclosed flow sequence characters eg `[[[` ([PR#210](https://github.com/biojppm/rapidyaml/pulls/210)). Same thing for `[]]`.
198+
- Refactor error message building and parser debug logging to use the new dump facilities in c4core ([PR#212](https://github.com/biojppm/rapidyaml/pulls/212)).
199+
- Parse: fix read-after-free when duplicating a parser state node, when pushing to the stack requires a stack buffer resize ([PR#210](https://github.com/biojppm/rapidyaml/pulls/210)).
200+
- Add support for legacy gcc 4.8 ([PR#217](https://github.com/biojppm/rapidyaml/pulls/217)).
201+
202+
203+
### Improvements
204+
205+
- Rewrite filtering of scalars to improve parsing performance ([PR #188](https://github.com/biojppm/rapidyaml/pull/188)). Previously the scalar strings were filtered in place, which resulted in quadratic complexity in terms of scalar length. This did not matter for small scalars fitting the cache (which is the more frequent case), but grew in cost as the scalars grew larger. To achieve linearity, the code was changed so that the strings are now filtered to a temporary scratch space in the parser, and copied back to the output buffer after filtering, if any change occurred. The improvements were large for the folded scalars; the table below shows the benchmark results of throughput (MB/s) for several files containing large scalars of a single type:
206+
| scalar type | before | after | improvement |
207+
|:------------|-------:|-------:|---------:|
208+
| block folded | 276 | 561 | 103% |
209+
| block literal | 331 | 611 | 85% |
210+
| single quoted | 247 | 267 | 8% |
211+
| double quoted | 212 | 230 | 8% |
212+
| plain (unquoted) | 173 | 186 | 8% |
213+
214+
The cost for small scalars is negligible, with benchmark improvement in the interval of -2% to 5%, so well within the margin of benchmark variability in a regular OS. In the future, this will be optimized again by copying each character in place, thus completely avoiding the staging arena.
215+
- `Callbacks`: add `operator==()` and `operator!=()` ([PR #168](https://github.com/biojppm/rapidyaml/pull/168)).
216+
- `Tree`: on error or assert prefer the error callback stored into the tree's current `Callbacks`, rather than the global `Callbacks` ([PR #168](https://github.com/biojppm/rapidyaml/pull/168)).
217+
- `detail::stack<>`: improve behavior when assigning from objects `Callbacks`, test all rule-of-5 scenarios ([PR #168](https://github.com/biojppm/rapidyaml/pull/168)).
218+
- Improve formatting of error messages.
219+
220+
221+
### Thanks
222+
223+
- @ingydotnet
224+
- @perlpunk
225+
- @cschreib
226+
- @fargies
227+
- @Xeonacid
228+
- @aviktorov
229+
- @xTVaser

0 commit comments

Comments
 (0)