Skip to content

Fix parsing when element is between buffers#1

Open
hanneskuettner wants to merge 1 commit intofaruktoptas:masterfrom
hanneskuettner:dev
Open

Fix parsing when element is between buffers#1
hanneskuettner wants to merge 1 commit intofaruktoptas:masterfrom
hanneskuettner:dev

Conversation

@hanneskuettner
Copy link
Copy Markdown

This PR fixes an issue where the characters method in the XMLParser is called with element text that is cut in half due to the buffer size of the underlying parser.
We cannot stop looking at the value that we are interested in after we got the first value, but rather until we encounter the end tag for this element.

@faruktoptas
Copy link
Copy Markdown
Owner

Thank you for your contribution. Can you provide a Rss URL or a sample Xml please? I will create a unit test for this.

@hanneskuettner
Copy link
Copy Markdown
Author

Sure thing!
Those are just two files. One file with the pubDate right at the 8096 byte buffer size, which fails on the original version, since it cuts the pubDate in half (stop after April).
In the other file all the fields fit in one buffer and are not cut.

test_fails_on_old.txt
test_succeeds_on_old.txt

@faruktoptas
Copy link
Copy Markdown
Owner

faruktoptas commented Apr 21, 2017

This works but when try to parse this xml:
onediorss.txt
First item title is "". But it works without your modifications.

@hanneskuettner
Copy link
Copy Markdown
Author

I will look into that.
Can you provide me with more example files that you use for testing?

@faruktoptas
Copy link
Copy Markdown
Owner

I will add more samples. Now I am working on unit testing. It is almost done.

@faruktoptas
Copy link
Copy Markdown
Owner

I added unit tests to dev branch. You can go on working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants