-
Notifications
You must be signed in to change notification settings - Fork 252
Open
Description
I want to use parse_document to create dom/vdom patches but the parse_document(...)
keeps adding <html>
and <body>
. I wonder, is there an option to fine-tune the error correction level? I like that it does add a </title>
in the example below.
But for creating a virtual-dom patch on a <div id="here">
it is bad to have to filter the html tags out afterwards.
/// parse none-escaped html strings as "Hello world!" into a node tree (see also raw_html(...))
pub fn parse_html<MSG>(html: &str) -> Result<Option<Node<MSG>>, ParseError> {
let dom: RcDom = parse_document(RcDom::default(), Default::default()).one(html);
if let Some(body) = find_body(&dom.document) {
let new_document = Rc::new(markup5ever_rcdom::Node {
data: NodeData::Document,
parent: Cell::new(None),
children: body.children.clone(),
});
process_handle(&new_document)
} else {
Err(ParseError::NoBodyInParsedHtml)
}
}
// Recursively find the <body> element
fn find_body(handle: &Handle) -> Option<Handle> {
match &handle.data {
NodeData::Element { name, .. } if name.local.as_ref() == "body" => Some(handle.clone()),
_ => {
for child in handle.children.borrow().iter() {
if let Some(body) = find_body(child) {
return Some(body);
}
}
None
}
}
}
However, my problem is that I also want to parse html with a <html>...</html>
tag in it and then it gets removed.
html-driver.rs test
#[test]
fn from_utf8() {
let dom = driver::parse_document(RcDom::default(), Default::default())
.from_utf8()
.one("<title>Test".as_bytes());
let mut serialized = Vec::new();
let document: SerializableHandle = dom.document.clone().into();
serialize::serialize(&mut serialized, &document, Default::default()).unwrap();
assert_eq!(
String::from_utf8(serialized).unwrap().replace(' ', ""),
"<html><head><title>Test</title></head><body></body></html>"
);
}
Update:
parse_fragment is also adding unwanted html.
Metadata
Metadata
Assignees
Labels
No labels