Deserialize custom entities inside attributes#951
Deserialize custom entities inside attributes#951mmirate wants to merge 4 commits intotafia:masterfrom
Conversation
89fc19f to
88ba6fc
Compare
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #951 +/- ##
==========================================
+ Coverage 55.08% 56.44% +1.35%
==========================================
Files 44 44
Lines 16911 17627 +716
==========================================
+ Hits 9316 9949 +633
- Misses 7595 7678 +83
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Mingun
left a comment
There was a problem hiding this comment.
It seems to me that current implementation will apply entity resolving twice for the text case:
#[test]
fn need_to_give_reasonable_name() {
let value: String = from_str("<root>&lt;</root>").unwrap();
assert_eq!(value, "<");
}That test should be added.
Probably it would be better to create separate deserializer for attribute values rather that reuse SimpleTypeDeserializer. It is hard to say without trying.
| pub fn from_text(text: Cow<'de, str>, entity_resolver: &'a E) -> Self { | ||
| let content = match text { | ||
| Cow::Borrowed(slice) => CowRef::Input(slice.as_bytes()), | ||
| Cow::Owned(content) => CowRef::Owned(content.into_bytes()), | ||
| }; | ||
| Self::new(content, false, XmlVersion::V1_0, Decoder::utf8()) | ||
| Self::new( | ||
| content, | ||
| false, | ||
| XmlVersion::V1_0, | ||
| Decoder::utf8(), | ||
| entity_resolver, | ||
| ) | ||
| } | ||
| /// Creates a deserializer from an XML text node, that possible borrowed from input. | ||
| /// | ||
| /// It is assumed that `text` does not have entities. | ||
| /// | ||
| /// This constructor used internally to deserialize from text nodes. | ||
| pub fn from_text_content(value: Text<'de>) -> Self { | ||
| Self::from_text(value.text) | ||
| pub fn from_text_content(value: Text<'de>, entity_resolver: &'a E) -> Self { | ||
| Self::from_text(value.text, entity_resolver) | ||
| } |
There was a problem hiding this comment.
SimpleTypeDeserializer created from text content cannot contain entities, as noted in the docs, so it is redundant to specify entity resolver here. Text contains already merged text and expanded entities:
Lines 2382 to 2434 in 6238d8a
Please add the following test:
#[test]
fn need_to_give_reasonable_name() {
let value: String = from_str("<root>&lt;</root>").unwrap();
assert_eq!(value, "<");
}There was a problem hiding this comment.
The abovementioned test will pass.
This test will also pass:
#[test]
fn need_to_give_reasonable_name() {
let value: BTreeMap<String, String> = from_str(r#"<root attr="&lt;" />"#).unwrap();
assert_eq!(value, BTreeMap::from_iter([
(String::from("@attr"), String::from("<"))
]));
}The tests where instead of & there is a custom entity resolver that supports a custom analogue of & are more longwinded. These tests pass for &lt; in element text but produce an UnterminatedEntity error for &lt; as attribute content. I'm not sure whether that is a bug or an inevitable consequence of the recursive nature of "attribute normalization".
| /// to return an [`EscapeError::UnrecognizedEntity`] error. | ||
| /// | ||
| /// [`EscapeError::UnrecognizedEntity`]: crate::escape::EscapeError::UnrecognizedEntity | ||
| entity_resolver: &'a E, |
There was a problem hiding this comment.
entity_resolver should be stored by value.
There was a problem hiding this comment.
This would require EntityResolver to have Clone as a supertrait - is this desirable?
| }; | ||
| let mut de = Deserializer::with_resolver( | ||
| br#" | ||
| <!DOCTYPE dict[ <!ENTITY unc "unclassified"> ]> |
There was a problem hiding this comment.
To keep as least one validity constraint of XML, use the name as the root tag:
| <!DOCTYPE dict[ <!ENTITY unc "unclassified"> ]> | |
| <!DOCTYPE root[ <!ENTITY unc "unclassified"> ]> |
| }; | ||
| let mut de = Deserializer::with_resolver( | ||
| br#" | ||
| <!DOCTYPE dict[ <!ENTITY unc "unclassified"> ]> |
There was a problem hiding this comment.
And here
| <!DOCTYPE dict[ <!ENTITY unc "unclassified"> ]> | |
| <!DOCTYPE root[ <!ENTITY unc "unclassified"> ]> |
| pub const fn into_map_access<E: EntityResolver>( | ||
| self, | ||
| version: XmlVersion, | ||
| prefix: &'static str, | ||
| ) -> AttributesDeserializer<'i> { | ||
| entity_resolver: &'i E, | ||
| ) -> AttributesDeserializer<'i, E> { | ||
| AttributesDeserializer { | ||
| iter: self, | ||
| value: None, | ||
| prefix, | ||
| key_buf: String::new(), | ||
| version, | ||
| entity_resolver, | ||
| } | ||
| } |
There was a problem hiding this comment.
Probably
impl AttributesDeserializer {
fn with_resolver<E: EntityResolver>(self, resolver) -> AttributesDeserializer<E> {
// replace entity_resolver with new one
}
}would be more practical, because I think, in most cases PredefinedEntityResolver will be used. So if you need use specific resolver, you may write a chain:
attributes.into_map_access("@").with_resolver(my_resolver)
As shown by the new tests, the result is that deserialization will process custom entities in attributes as well as in text content.
This seems to require some breaking changes to the public API.