Skip to content

Commit 7a356ec

Browse files
Mingundralley
andcommitted
seq: Allow overlapping between list items and other items
Example of such XML: <item/> <another-item/> <item/> <item/> Here we need to skip `<another-item/>` in order to collect all `<item/>`s. So ability to skip events and replay them later was added This fixes all remaining tests Co-authored-by: Daniel Alley <[email protected]>
1 parent 1142da4 commit 7a356ec

File tree

7 files changed

+805
-108
lines changed

7 files changed

+805
-108
lines changed

.github/workflows/rust.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,10 @@ jobs:
4040
env:
4141
LLVM_PROFILE_FILE: coverage/serialize-escape-html-%p-%m.profraw
4242
run: cargo test --features serialize,escape-html
43+
- name: Run tests (all features)
44+
env:
45+
LLVM_PROFILE_FILE: coverage/all-features-%p-%m.profraw
46+
run: cargo test --all-features
4347
- name: Prepare coverage information for upload
4448
if: runner.os == 'Linux'
4549
# --token is required by grcov, but not required by coveralls.io, so pass

Cargo.toml

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,46 @@ default = []
4242
## [standard compliant]: https://www.w3.org/TR/xml11/#charencoding
4343
encoding = ["encoding_rs"]
4444

45+
## This feature enables support for deserializing lists where tags are overlapped
46+
## with tags that do not correspond to the list.
47+
##
48+
## When this feature is enabled, the XML:
49+
## ```xml
50+
## <any-name>
51+
## <item/>
52+
## <another-item/>
53+
## <item/>
54+
## <item/>
55+
## </any-name>
56+
## ```
57+
## could be deserialized to a struct:
58+
## ```ignore
59+
## #[derive(Deserialize)]
60+
## #[serde(rename_all = "kebab-case")]
61+
## struct AnyName {
62+
## item: Vec<()>,
63+
## another_item: (),
64+
## }
65+
## ```
66+
##
67+
## When this feature is not enabled (default), only the first element will be
68+
## associated with the field, and the deserialized type will report an error
69+
## (duplicated field) when the deserializer encounters a second `<item/>`.
70+
##
71+
## Note, that enabling this feature can lead to high and even unlimited memory
72+
## consumption, because deserializer should check all events up to the end of a
73+
## container tag (`</any-name>` in that example) to figure out that there are no
74+
## more items for a field. If `</any-name>` or even EOF is not encountered, the
75+
## parsing will never end which can lead to a denial-of-service (DoS) scenario.
76+
##
77+
## Having several lists and overlapped elements for them in XML could also lead
78+
## to quadratic parsing time, because the deserializer must check the list of
79+
## events as many times as the number of sequence fields present in the schema.
80+
##
81+
## This feature works only with `serialize` feature and has no effect if `serialize`
82+
## is not enabled.
83+
overlapped-lists = []
84+
4585
## Enables support for [`serde`] serialization and deserialization
4686
serialize = ["serde"]
4787

Changelog.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,11 @@
1010

1111
## Unreleased
1212

13+
### New Features
14+
15+
- [#387]: Allow overlapping between elements of sequence and other elements
16+
(using new feature `overlapped-lists`)
17+
1318
### Bug Fixes
1419

1520
- [#9]: Deserialization erroneously was successful in some cases where error is expected.

src/de/map.rs

Lines changed: 29 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,9 @@ enum ValueSource {
105105
/// [list of known fields]: MapAccess::fields
106106
Content,
107107
/// Next value should be deserialized from an element with a dedicated name.
108+
/// If deserialized type is a sequence, then that sequence will collect all
109+
/// elements with the same name until it will be filled. If not all elements
110+
/// would be consumed, the rest will be ignored.
108111
///
109112
/// That state is set when call to [`peek()`] returns a [`Start`] event, which
110113
/// [`name()`] represents a field name. That name will be deserialized as a key.
@@ -569,7 +572,11 @@ where
569572
map: &'m mut MapAccess<'de, 'a, R>,
570573
/// Filter that determines whether a tag is a part of this sequence.
571574
///
572-
/// Iteration will stop when found a tag that does not pass this filter.
575+
/// When feature `overlapped-lists` is not activated, iteration will stop
576+
/// when found a tag that does not pass this filter.
577+
///
578+
/// When feature `overlapped-lists` is activated, all tags, that not pass
579+
/// this check, will be skipped.
573580
filter: TagFilter<'de>,
574581
}
575582

@@ -584,20 +591,29 @@ where
584591
T: DeserializeSeed<'de>,
585592
{
586593
let decoder = self.map.de.reader.decoder();
587-
match self.map.de.peek()? {
588-
// Stop iteration when list elements ends
589-
DeEvent::Start(e) if !self.filter.is_suitable(&e, decoder)? => Ok(None),
594+
loop {
595+
break match self.map.de.peek()? {
596+
// If we see a tag that we not interested, skip it
597+
#[cfg(feature = "overlapped-lists")]
598+
DeEvent::Start(e) if !self.filter.is_suitable(&e, decoder)? => {
599+
self.map.de.skip()?;
600+
continue;
601+
}
602+
// Stop iteration when list elements ends
603+
#[cfg(not(feature = "overlapped-lists"))]
604+
DeEvent::Start(e) if !self.filter.is_suitable(&e, decoder)? => Ok(None),
590605

591-
// Stop iteration after reaching a closing tag
592-
DeEvent::End(e) if e.name() == self.map.start.name() => Ok(None),
593-
// This is a unmatched closing tag, so the XML is invalid
594-
DeEvent::End(e) => Err(DeError::UnexpectedEnd(e.name().to_owned())),
595-
// We cannot get `Eof` legally, because we always inside of the
596-
// opened tag `self.map.start`
597-
DeEvent::Eof => Err(DeError::UnexpectedEof),
606+
// Stop iteration after reaching a closing tag
607+
DeEvent::End(e) if e.name() == self.map.start.name() => Ok(None),
608+
// This is a unmatched closing tag, so the XML is invalid
609+
DeEvent::End(e) => Err(DeError::UnexpectedEnd(e.name().to_owned())),
610+
// We cannot get `Eof` legally, because we always inside of the
611+
// opened tag `self.map.start`
612+
DeEvent::Eof => Err(DeError::UnexpectedEof),
598613

599-
// Start(tag), Text, CData
600-
_ => seed.deserialize(&mut *self.map.de).map(Some),
614+
// Start(tag), Text, CData
615+
_ => seed.deserialize(&mut *self.map.de).map(Some),
616+
};
601617
}
602618
}
603619
}

0 commit comments

Comments
 (0)