Skip to content

Commit f8ee5fb

Browse files
committed
Add roundtrip engine tests, fix minor problems
1 parent 71ffaf8 commit f8ee5fb

20 files changed

+402
-241
lines changed

changelog/current.md

Lines changed: 46 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,36 @@
1-
### Changes
1+
### New features
22

3-
- [PR#547](https://github.com/biojppm/rapidyaml/pull/547) - Fix parsing of implicit first documents with empty sequences, caused by a problem in `Tree::set_root_as_stream()`:
4-
```yaml
5-
[] # this container was lost during parsing
6-
---
7-
more data here
8-
```
9-
- [PR#561](https://github.com/biojppm/rapidyaml/pull/561) (fixes [#559](https://github.com/biojppm/rapidyaml/issues/559)) - Byte Order Mark: account for BOM when determining block indentation
10-
- [PR#563](https://github.com/biojppm/rapidyaml/pull/563) (fixes [#562](https://github.com/biojppm/rapidyaml/issues/562)) - Fix bug in `NodeRef::cend()`
11-
- [PR#565](https://github.com/biojppm/rapidyaml/pull/565) (fixes [#564](https://github.com/biojppm/rapidyaml/issues/564)) - `Tree` arena: allow relocation of zero-length strings when placed at the end (relax assertions triggered in `Tree::_relocated()`)
12-
- [PR#557](https://github.com/biojppm/rapidyaml/pull/557) - `Tree` is now non-empty by default, and `Tree::root_id()` will no longer modify the tree when it is empty. To create an empty tree now, it is necessary to use the capacity constructor with a capacity of zero:
3+
- [PR#550](https://github.com/biojppm/rapidyaml/pull/550) - Implement flow multiline style (`FLOW_ML`):
4+
- The parser now detects this style automatically for flow seqs or maps when the terminating bracket sits on a line different from the opening bracket.
5+
- Added `ParserOptions::detect_flow_ml()` to enable/disable this behavior
6+
- Added `EmitOptions::indent_flow_ml()` to control indentation of FLOW_ML containers
7+
- The emit implementation logic was refactored, and is now significantly cleaner
8+
- Emitted YAML will now have anchors emitted before tags, as is customary ([see example](https://play.yaml.io/main/parser?input=LSAhdGFnICZhbmNob3IgfAogIG5vdGUgaG93IHRoZSBhbmNob3IgY29tZXMKICBmaXJzdCBpbiB0aGUgZXZlbnRz)). Previously this
9+
- Added `ParserOptions` defaulted argument to temp-parser overloads of `parse_{yaml,json}_in_{place,arena}()`
10+
11+
12+
### API changes
13+
14+
- **BREAKING** [PR#503](https://github.com/biojppm/rapidyaml/pull/503) (fixes [#399](https://github.com/biojppm/rapidyaml/issues/399)): change error callbacks.
15+
- Errors in ryml now have one of these types:
16+
- parse error: when parsing YAML/JSON. See: `pfn_error_parse`, `ErrorDataParse`, `ExceptionParse`, `err_parse_format()`, `sample_error_parse`.
17+
- visit error: when visiting a tree (reading or writing). See: `pfn_error_visit`, `ErrorDataVisit`, `ExceptionVisit`, `err_visit_format()`, `sample_error_visit`.
18+
- basic error: other, non specific errors. See: `pfn_error_basic`, `ErrorDataBasic`, `ExceptionBasic`, `err_basic_format()`, `sample_error_basic`.
19+
- parse and visit errors/exceptions can be treated/caught as basic errors/exceptions.
20+
- Add message formatting functions to simplify user-provided implementation of error callbacks:
21+
- `err_parse_format()`: format/print a full error message for a parse error
22+
- `err_visit_format()`: format/print a full error message for a visit error
23+
- `err_basic_format()`: format/print a full error message for a basic error
24+
- `location_format()`: format/print a location
25+
- `location_format_with_context()`: useful to create a rich error message showing the YAML region causing the error, maybe even for a visit error if the source is kept and locations are enabled.
26+
- `format_exc()`: format an exception (when exceptions are enabled)
27+
- See the new header `c4/yml/error.hpp` (and `c4/yml/error.def.hpp` for definitions of the functions in `c4/yml/error.hpp`)
28+
- See the relevant sample functions in the quickstart sample: `sample_error_basic`, `sample_error_parse` and `sample_error_visit`.
29+
- There are breaking user-facing changes in the `Callbacks` structure:
30+
- Removed member `m_error `
31+
- Added members `m_error_basic`, `m_error_parse`, `m_error_visit`
32+
- Added methods `.set_error_basic()`, `.set_error_parse()` and `.set_error_visit()`.
33+
- **BREAKING** [PR#557](https://github.com/biojppm/rapidyaml/pull/557) - `Tree` is now non-empty by default, and `Tree::root_id()` will no longer modify the tree when it is empty. To create an empty tree, it is now necessary to use the capacity constructor with a capacity of zero:
1334
```c++
1435
// breaking change: default-constructed tree is now non-empty
1536
Tree tree;
@@ -24,39 +45,25 @@
2445
```
2546
This changeset also enables the python library to call `root_id()` on a default-constructed tree (fixes [#556](https://github.com/biojppm/rapidyaml/issues/556)).
2647
- [PR#560](https://github.com/biojppm/rapidyaml/pull/560) (see also [#554](https://github.com/biojppm/rapidyaml/issues/554)): python improvements:
27-
- expose `Tree::to_arena()` in python. This allows safer and easier programatic creation of trees in python by ensuring scalars have the same lifetime as the tree:
48+
- expose `Tree::to_arena()` in python. This allows safer and easier programatic creation of trees in python by ensuring scalars are placed into the tree and so have the same lifetime as the tree:
2849
```python
2950
t = ryml.Tree()
30-
s = t.to_arena(temp_string()) # Save a temporary string to the tree's arena.
51+
s = t.to_arena(temp_string()) # Copy/serialize a temporary string to the tree's arena.
3152
# Works also with integers and floats.
3253
t.to_val(t.root_id(), s) # Now we can safely use the scalar in the tree:
3354
# there is no longer any risk of it being deallocated
3455
```
3556
- improve behavior of `Tree` methods accepting scalars: all standard buffer types are now accepted (ie, `str`, `bytes`, `bytearray` and `memoryview`).
36-
- [PR#550](https://github.com/biojppm/rapidyaml/pull/550) - Implement FLOW_ML style (flow multiline).
37-
- The parser now sets this style automatically for flow seqs or maps when the terminating bracket sits on a line different from the opening bracket.
38-
- Added `ParserOptions::detect_flow_ml()` to control this behavior
39-
- Added `EmitOptions::indent_flow_ml()` to control indentation of FLOW_ML containers
40-
- The emit implementation logic was refactored, and is now significantly cleaner
41-
- Emitted YAML will now have anchors emitted before tags, as is customary ([see example](https://play.yaml.io/main/parser?input=LSAhdGFnICZhbmNob3IgfAogIG5vdGUgaG93IHRoZSBhbmNob3IgY29tZXMKICBmaXJzdCBpbiB0aGUgZXZlbnRz)).
42-
- Added `ParserOptions` defaulted argument to temp-parser overloads of `parse_{yaml,json}_in_{place,arena}()`
43-
[PR#503](https://github.com/biojppm/rapidyaml/pull/503) (fixes [#399](https://github.com/biojppm/rapidyaml/issues/399)): change error callbacks.
44-
- Errors in ryml now have one of these types:
45-
- parse error: when parsing YAML/JSON. See: `pfn_error_parse`, `ErrorDataParse`, `ExceptionParse`, `err_parse_format()`, `sample_error_parse`.
46-
- visit error: when visiting a tree (reading or writing). See: `pfn_error_visit`, `ErrorDataVisit`, `ExceptionVisit`, `err_visit_format()`, `sample_error_visit`.
47-
- basic error: other, non specific errors. See: `pfn_error_basic`, `ErrorDataBasic`, `ExceptionBasic`, `err_basic_format()`, `sample_error_basic`.
48-
- parse and visit errors/exceptions can be treated/caught as basic errors/exceptions
49-
- Add message formatting functions to simplify implementation of error callbacks:
50-
- `err_parse_format()`: format/print a full error message for a parse error
51-
- `err_visit_format()`: format/print a full error message for a visit error
52-
- `err_basic_format()`: format/print a full error message for a basic error
53-
- `location_format()`: format/print a location
54-
- `location_format_with_context()`: useful to create a rich error message showing the YAML region causing the error, maybe even for a visit error if the source is kept and locations are enabled.
55-
- `format_exc()`: when exceptions are enabled
56-
- See the new header `c4/yml/error.hpp` (and `c4/yml/error.def.hpp` for definitions of the functions in `c4/yml/error.hpp`)
57-
- See the relevant sample functions in the quickstart sample: `sample_error_basic`, `sample_error_parse` and `sample_error_visit`.
58-
- There are breaking user-facing changes in the `Callbacks` structure:
59-
- Removed member `m_error `
60-
- Added members `m_error_basic`, `m_error_parse`, `m_error_visit`
61-
- Added methods `.set_error_basic()`, `.set_error_parse()` and `.set_error_visit()`.
57+
- [PR#565](https://github.com/biojppm/rapidyaml/pull/565) (fixes [#564](https://github.com/biojppm/rapidyaml/issues/564)) - `Tree` arena: allow relocation of zero-length strings when placed at the end (relax assertions triggered in `Tree::_relocated()`)
58+
- [PR#563](https://github.com/biojppm/rapidyaml/pull/563) (fixes [#562](https://github.com/biojppm/rapidyaml/issues/562)) - Fix bug in `NodeRef::cend()`
59+
6260

61+
### Fixes in YAML parsing
62+
63+
- [PR#561](https://github.com/biojppm/rapidyaml/pull/561) (fixes [#559](https://github.com/biojppm/rapidyaml/issues/559)) - Byte Order Mark: account for BOM length when determining block indentation
64+
- [PR#547](https://github.com/biojppm/rapidyaml/pull/547) - Fix parsing of implicit first documents with empty sequences, caused by a problem in `Tree::set_root_as_stream()`:
65+
```yaml
66+
[] # this container was lost during parsing
67+
---
68+
more data here
69+
```

samples/quickstart.cpp

Lines changed: 28 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -4307,7 +4307,7 @@ void sample_style()
43074307
{
43084308
// we will be using this helper throughout this function
43094309
auto tostr = [](ryml::ConstNodeRef n) { return ryml::emitrs_yaml<std::string>(n); };
4310-
// let's parse this:
4310+
// let's parse this yaml:
43114311
ryml::csubstr yaml = R"(block map:
43124312
block key: block val
43134313
block seq:
@@ -4345,11 +4345,15 @@ flow seq, multiline: [
43454345
CHECK(tree["flow seq, singleline"].is_flow());
43464346
CHECK(tree["flow map, multiline"].is_flow());
43474347
CHECK(tree["flow seq, multiline"].is_flow());
4348-
// since the tree nodes are marked with style during the parse,
4349-
// emission will preserve the original style (minus whitespace):
4348+
//
4349+
// since the tree nodes are marked with their original parsed
4350+
// style, emitting the parsed tree will preserve the original
4351+
// style (minus whitespace):
4352+
//
43504353
CHECK(tostr(tree) == yaml); // same as before!
43514354
//
43524355
// you can set/modify the style programatically!
4356+
//
43534357
// here are more examples.
43544358
//
43554359
{
@@ -4450,9 +4454,9 @@ flow seq, multiline: [flow val,flow val]
44504454
tree.rootref().clear_style(/*recurse*/true);
44514455
// when emitting nodes which have no style set, ryml will default
44524456
// to block format for containers, and call
4453-
// ryml::scalar_style_choose() to pick the style for each
4454-
// scalar. Note that it picks single-quoted for the scalars
4455-
// containing commas:
4457+
// ryml::scalar_style_choose() to pick the style for each scalar
4458+
// (at the cost of a scan over each scalar). Note that ryml picks
4459+
// single-quoted for scalars containing commas:
44564460
CHECK(tostr(tree) ==
44574461
R"(block map:
44584462
block key: block val
@@ -4472,7 +4476,8 @@ block seq:
44724476
- flow val
44734477
)");
44744478
// you can set the style based on type conditions:
4475-
// set a single key to single-quoted
4479+
//
4480+
// eg, set a single key to single-quoted
44764481
tree["block map"].set_style_conditionally(/*type_mask*/ryml::KEY,
44774482
/*remflags*/ryml::KEY_STYLE,
44784483
/*addflags*/ryml::KEY_SQUO,
@@ -4662,6 +4667,21 @@ void sample_style_flow_ml_filter()
46624667
]
46634668
}
46644669
}
4670+
)";
4671+
ryml::csubstr yaml_not_indented = R"({
4672+
map: {
4673+
seq: [
4674+
0,
4675+
1,
4676+
2,
4677+
3,
4678+
[
4679+
40,
4680+
41
4681+
]
4682+
]
4683+
}
4684+
}
46654685
)";
46664686
// note that the parser defaults to detect multiline flow
46674687
// (FLOW_ML) containers:
@@ -4687,22 +4707,7 @@ void sample_style_flow_ml_filter()
46874707
const ryml::EmitOptions noindent = ryml::EmitOptions{}.indent_flow_ml(false);
46884708
const ryml::Tree tree = ryml::parse_in_arena(yaml);
46894709
CHECK(tree["map"].is_flow_ml()); // etc
4690-
CHECK(ryml::emitrs_yaml<std::string>(tree, noindent) ==
4691-
R"({
4692-
map: {
4693-
seq: [
4694-
0,
4695-
1,
4696-
2,
4697-
3,
4698-
[
4699-
40,
4700-
41
4701-
]
4702-
]
4703-
}
4704-
}
4705-
)");
4710+
CHECK(ryml::emitrs_yaml<std::string>(tree, noindent) == yaml_not_indented);
47064711
}
47074712
}
47084713

src/c4/yml/detail/dbgprint.hpp

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,8 +57,6 @@
5757
#endif
5858

5959

60-
#define RYML_LOGBUF_SIZE_MAX RYML_ERRMSG_SIZE
61-
6260
#include <c4/dump.hpp>
6361

6462
C4_SUPPRESS_WARNING_GCC_WITH_PUSH("-Wattributes")

src/c4/yml/emit.def.hpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ substr Emitter<Writer>::emit_as(EmitType_e type, Tree const& tree, id_type id, b
4141
return this->Writer::_get(error_on_excess);
4242
}
4343

44+
/** @cond dev */
4445

4546
//-----------------------------------------------------------------------------
4647

@@ -1476,6 +1477,8 @@ void Emitter<Writer>::_write_scalar_json_dquo(csubstr s)
14761477
_write('"');
14771478
}
14781479

1480+
/** @endcond */
1481+
14791482
} // namespace yml
14801483
} // namespace c4
14811484

src/c4/yml/emit.hpp

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ struct EmitOptions
5959
{
6060
public:
6161

62+
/** @cond dev */
6263
typedef enum : uint32_t {
6364
EMIT_NONROOT_KEY = 1u << 0u,
6465
EMIT_NONROOT_DASH = 1u << 1u,
@@ -69,6 +70,7 @@ struct EmitOptions
6970
_JSON_ERR_MASK = JSON_ERR_ON_TAG|JSON_ERR_ON_ANCHOR,
7071
DEFAULT_FLAGS = EMIT_NONROOT_KEY|INDENT_FLOW_ML,
7172
} EmitOptionFlags_e;
73+
/** @endcond */
7274

7375
public:
7476

@@ -368,19 +370,27 @@ class Emitter : public Writer
368370

369371
private:
370372

373+
// g++-4.8 has problems with the operand types here...
374+
#if defined(__GNUC__) && (__GNUC__ < 5) && (!defined(__clang__))
375+
#pragma GCC diagnostic push
376+
#pragma GCC diagnostic ignored "-Wparentheses"
377+
#endif
371378
enum : type_bits {
372379
_styles_block_key = KEY_LITERAL|KEY_FOLDED,
373380
_styles_block_val = VAL_LITERAL|VAL_FOLDED,
374-
_styles_block = _styles_block_key|_styles_block_val,
375-
_styles_flow_key = KEY_STYLE & ~_styles_block_key,
376-
_styles_flow_val = VAL_STYLE & ~_styles_block_val,
377-
_styles_flow = _styles_flow_key|_styles_flow_val,
381+
_styles_block = ((type_bits)_styles_block_key) | ((type_bits)_styles_block_val),
382+
_styles_flow_key = KEY_STYLE & (~((type_bits)_styles_block_key)),
383+
_styles_flow_val = VAL_STYLE & (~((type_bits)_styles_block_val)),
384+
_styles_flow = ((type_bits)_styles_flow_key) | ((type_bits)_styles_flow_val),
378385
_styles_squo = KEY_SQUO|VAL_SQUO,
379386
_styles_dquo = KEY_DQUO|VAL_DQUO,
380387
_styles_plain = KEY_PLAIN|VAL_PLAIN,
381388
_styles_literal = KEY_LITERAL|VAL_LITERAL,
382389
_styles_folded = KEY_FOLDED|VAL_FOLDED,
383390
};
391+
#if defined(__GNUC__) && (__GNUC__ < 5) && (!defined(__clang__))
392+
#pragma GCC diagnostic pop
393+
#endif
384394

385395
/** @endcond */
386396
};

src/c4/yml/event_handler_tree.hpp

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -349,7 +349,7 @@ struct EventHandlerTree : public EventHandlerStack<EventHandlerTree, EventHandle
349349
_c4dbgpf("node[{}]: added sibling={} prev={}", m_parent->node_id, m_curr->node_id, m_tree->prev_sibling(m_curr->node_id));
350350
}
351351

352-
/** set the previous val as the first key of a new map, with flow style.
352+
/** reset the previous val as the first key of a new map, with flow style.
353353
*
354354
* See the documentation for @ref doc_event_handlers, which has
355355
* important notes about this event.
@@ -363,7 +363,7 @@ struct EventHandlerTree : public EventHandlerStack<EventHandlerTree, EventHandle
363363
_RYML_ASSERT_VISIT_(m_stack.m_callbacks, !m_tree->is_container(m_curr->node_id), m_tree, m_curr->node_id);
364364
_RYML_ASSERT_VISIT_(m_stack.m_callbacks, !m_tree->has_key(m_curr->node_id), m_tree, m_curr->node_id);
365365
const NodeData tmp = _val2key_(*m_curr->tr_data);
366-
_disable_(_VALMASK|VAL_STYLE);
366+
_disable_(_VALMASK|VAL_STYLE|VALNIL);
367367
m_curr->tr_data->m_val = {};
368368
begin_map_val_flow();
369369
m_curr->tr_data->m_type = tmp.m_type;
@@ -739,6 +739,8 @@ struct EventHandlerTree : public EventHandlerStack<EventHandlerTree, EventHandle
739739
r.m_type.type = ((d.m_type.type & (_VALMASK|VAL_STYLE)) >> 1u);
740740
r.m_type.type = (r.m_type.type & ~(_VALMASK|VAL_STYLE));
741741
r.m_type.type = (r.m_type.type | KEY);
742+
if(d.m_type.type & VALNIL)
743+
r.m_type.type = (r.m_type.type | KEYNIL);
742744
return r;
743745
}
744746

src/c4/yml/node_type.hpp

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -270,7 +270,6 @@ RYML_EXPORT inline C4_NO_INLINE bool scalar_is_null(csubstr s) noexcept
270270

271271
/** @} */
272272

273-
274273
/** @} */
275274

276275
} // namespace yml

src/c4/yml/parse_engine.def.hpp

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -803,6 +803,7 @@ bool ParseEngine<EventHandler>::_is_valid_start_scalar_plain_flow(csubstr s)
803803
case '}':
804804
case ']':
805805
case '\r':
806+
_RYML_WITH_TAB_TOKENS(case '\t':)
806807
if(s.str[0] == ':')
807808
{
808809
_c4dbgpf("not a scalar: found non-scalar token '{}{}'", s.str[0], s.str[1]);
@@ -952,7 +953,7 @@ bool ParseEngine<EventHandler>::_scan_scalar_plain_seq_flow(ScannedScalar *C4_RE
952953
_c4dbgp("found suspicious '#'");
953954
_RYML_ASSERT_BASIC_(m_evt_handler->m_stack.m_callbacks, offs > 0);
954955
char prev = s.str[offs - 1];
955-
if(prev == ' ' _RYML_WITH_TAB_TOKENS((|| prev == '\t')))
956+
if(prev == ' ' _RYML_WITH_TAB_TOKENS(|| prev == '\t'))
956957
{
957958
_c4dbgpf("found terminating character at {}: '{}'", offs, c);
958959
goto ended_scalar;
@@ -2205,7 +2206,7 @@ void ParseEngine<EventHandler>::_scan_block(ScannedBlock *C4_RESTRICT sb, size_t
22052206
if(fns != npos) // non-empty line
22062207
{
22072208
_RYML_WITH_TAB_TOKENS(
2208-
if(C4_UNLIKELY(lc.stripped.begins_with('\t')))
2209+
if(C4_UNLIKELY(lc.full.begins_with('\t')))
22092210
_c4err("parse error");
22102211
)
22112212
_c4dbgpf("blck: line not empty. indref={} indprov={} indentation={}", indref, provisional_indentation, lc.indentation);

src/c4/yml/parser_state.hpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ csubstr _parser_flags_to_str(substr buf, ParserFlag_t flags);
5656
struct LineContents
5757
{
5858
substr rem; ///< current line remainder, without newline characters
59-
substr full; ///< full line, including newline characters \n and \r
59+
substr full; ///< full line, including newline characters `\n` and `\r`
6060
size_t num_cols; ///< number of columns in the line, excluding newline
6161
///< characters (ie the initial size of rem)
6262
size_t indentation; ///< number of spaces on the beginning of the line.

src/c4/yml/tree.hpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -865,7 +865,7 @@ class RYML_EXPORT Tree
865865
* you can overload c4::to_chars(substr, T const&)
866866
*
867867
* @note To customize how the type gets serialized to the arena,
868-
* you can overload @ref c4::yml::serialize_scalar(substr, T const&)
868+
* you can overload @ref serialize_scalar()
869869
*
870870
* @note Growing the arena may cause relocation of the entire
871871
* existing arena, and thus change the contents of individual

0 commit comments

Comments
 (0)