Skip to content

Deserialize custom entities inside attributes#951

Open
mmirate wants to merge 4 commits intotafia:masterfrom
mmirate:mmirate-patch-1
Open

Deserialize custom entities inside attributes#951
mmirate wants to merge 4 commits intotafia:masterfrom
mmirate:mmirate-patch-1

Conversation

@mmirate
Copy link
Copy Markdown

@mmirate mmirate commented Apr 24, 2026

As shown by the new tests, the result is that deserialization will process custom entities in attributes as well as in text content.

This seems to require some breaking changes to the public API.

@Mingun Mingun added enhancement serde Issues related to mapping from Rust types to XML labels Apr 24, 2026
@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 81.25000% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.44%. Comparing base (a759d65) to head (88ba6fc).
⚠️ Report is 12 commits behind head on master.

Files with missing lines Patch % Lines
src/de/attributes.rs 0.00% 5 Missing ⚠️
src/de/map.rs 84.00% 4 Missing ⚠️
src/de/mod.rs 80.00% 2 Missing ⚠️
src/de/text.rs 0.00% 1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #951      +/-   ##
==========================================
+ Coverage   55.08%   56.44%   +1.35%     
==========================================
  Files          44       44              
  Lines       16911    17627     +716     
==========================================
+ Hits         9316     9949     +633     
- Misses       7595     7678      +83     
Flag Coverage Δ
unittests 56.44% <81.25%> (+1.35%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@Mingun Mingun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that current implementation will apply entity resolving twice for the text case:

#[test]
fn need_to_give_reasonable_name() {
    let value: String = from_str("<root>&amp;lt;</root>").unwrap();
    assert_eq!(value, "&lt;");
}

That test should be added.

Probably it would be better to create separate deserializer for attribute values rather that reuse SimpleTypeDeserializer. It is hard to say without trying.

Comment thread src/de/simple_type.rs Outdated
Comment on lines 520 to 540
pub fn from_text(text: Cow<'de, str>, entity_resolver: &'a E) -> Self {
let content = match text {
Cow::Borrowed(slice) => CowRef::Input(slice.as_bytes()),
Cow::Owned(content) => CowRef::Owned(content.into_bytes()),
};
Self::new(content, false, XmlVersion::V1_0, Decoder::utf8())
Self::new(
content,
false,
XmlVersion::V1_0,
Decoder::utf8(),
entity_resolver,
)
}
/// Creates a deserializer from an XML text node, that possible borrowed from input.
///
/// It is assumed that `text` does not have entities.
///
/// This constructor used internally to deserialize from text nodes.
pub fn from_text_content(value: Text<'de>) -> Self {
Self::from_text(value.text)
pub fn from_text_content(value: Text<'de>, entity_resolver: &'a E) -> Self {
Self::from_text(value.text, entity_resolver)
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SimpleTypeDeserializer created from text content cannot contain entities, as noted in the docs, so it is redundant to specify entity resolver here. Text contains already merged text and expanded entities:

quick-xml/src/de/mod.rs

Lines 2382 to 2434 in 6238d8a

/// Read all consequent [`Text`] and [`CData`] events until non-text event
/// occurs. Content of all events would be appended to `result` and returned
/// as [`DeEvent::Text`].
///
/// [`Text`]: PayloadEvent::Text
/// [`CData`]: PayloadEvent::CData
fn drain_text(&mut self, mut result: Cow<'i, str>) -> Result<DeEvent<'i>, DeError> {
loop {
if self.current_event_is_last_text() {
break;
}
match self.next_impl()? {
PayloadEvent::Text(e) => result
.to_mut()
.push_str(&e.xml_content(self.reader.xml_version())?),
PayloadEvent::CData(e) => result
.to_mut()
.push_str(&e.xml_content(self.reader.xml_version())?),
PayloadEvent::GeneralRef(e) => self.resolve_reference(result.to_mut(), e)?,
// SAFETY: current_event_is_last_text checks that event is Text, CData or GeneralRef
_ => unreachable!("Only `Text`, `CData` or `GeneralRef` events can come here"),
}
}
Ok(DeEvent::Text(Text::new(result)))
}
/// Return an input-borrowing event.
fn next(&mut self) -> Result<DeEvent<'i>, DeError> {
loop {
return match self.next_impl()? {
PayloadEvent::Start(e) => Ok(DeEvent::Start(e)),
PayloadEvent::End(e) => Ok(DeEvent::End(e)),
PayloadEvent::Text(e) => self.drain_text(e.xml_content(self.reader.xml_version())?),
PayloadEvent::CData(e) => {
self.drain_text(e.xml_content(self.reader.xml_version())?)
}
PayloadEvent::DocType(e) => {
self.entity_resolver
.capture(e)
.map_err(|err| DeError::Custom(format!("cannot parse DTD: {}", err)))?;
continue;
}
PayloadEvent::GeneralRef(e) => {
let mut text = String::new();
self.resolve_reference(&mut text, e)?;
self.drain_text(text.into())
}
PayloadEvent::Eof => Ok(DeEvent::Eof),
};
}
}

Please add the following test:

#[test]
fn need_to_give_reasonable_name() {
    let value: String = from_str("<root>&amp;lt;</root>").unwrap();
    assert_eq!(value, "&lt;");
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abovementioned test will pass.

This test will also pass:

#[test]
fn need_to_give_reasonable_name() {
    let value: BTreeMap<String, String> = from_str(r#"<root attr="&amp;lt;" />"#).unwrap();
    assert_eq!(value, BTreeMap::from_iter([
        (String::from("@attr"), String::from("&lt;"))
    ]));
}

The tests where instead of &amp; there is a custom entity resolver that supports a custom analogue of &amp; are more longwinded. These tests pass for &amp;lt; in element text but produce an UnterminatedEntity error for &amp;lt; as attribute content. I'm not sure whether that is a bug or an inevitable consequence of the recursive nature of "attribute normalization".

Comment thread src/de/simple_type.rs Outdated
/// to return an [`EscapeError::UnrecognizedEntity`] error.
///
/// [`EscapeError::UnrecognizedEntity`]: crate::escape::EscapeError::UnrecognizedEntity
entity_resolver: &'a E,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entity_resolver should be stored by value.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require EntityResolver to have Clone as a supertrait - is this desirable?

Comment thread tests/serde-de.rs Outdated
};
let mut de = Deserializer::with_resolver(
br#"
<!DOCTYPE dict[ <!ENTITY unc "unclassified"> ]>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep as least one validity constraint of XML, use the name as the root tag:

Suggested change
<!DOCTYPE dict[ <!ENTITY unc "unclassified"> ]>
<!DOCTYPE root[ <!ENTITY unc "unclassified"> ]>

Comment thread tests/serde-de.rs Outdated
};
let mut de = Deserializer::with_resolver(
br#"
<!DOCTYPE dict[ <!ENTITY unc "unclassified"> ]>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

Suggested change
<!DOCTYPE dict[ <!ENTITY unc "unclassified"> ]>
<!DOCTYPE root[ <!ENTITY unc "unclassified"> ]>

Comment thread src/de/attributes.rs Outdated
Comment on lines 72 to 86
pub const fn into_map_access<E: EntityResolver>(
self,
version: XmlVersion,
prefix: &'static str,
) -> AttributesDeserializer<'i> {
entity_resolver: &'i E,
) -> AttributesDeserializer<'i, E> {
AttributesDeserializer {
iter: self,
value: None,
prefix,
key_buf: String::new(),
version,
entity_resolver,
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably

impl AttributesDeserializer {
  fn with_resolver<E: EntityResolver>(self, resolver) -> AttributesDeserializer<E> {
    // replace entity_resolver with new one
  }
}

would be more practical, because I think, in most cases PredefinedEntityResolver will be used. So if you need use specific resolver, you may write a chain:

attributes.into_map_access("@").with_resolver(my_resolver)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement serde Issues related to mapping from Rust types to XML

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants