Skip to content

Parsing feed fails if it has html encoded characters #204

@shtolik

Description

@shtolik

Describe the bug
I tried to parse the feed https://myrskyla.fi/feed/ but it contains in a title tag Ä instead of Ä which then leads to exceptions and failing to parse feed both on android and ios side.
android:

RssParsingException(message=Something went wrong during the parsing of the feed. Please check if the XML is valid, cause=org.xmlpull.v1.XmlPullParserException: unresolved: ä (position:TEXT @11:22 in java.io.InputStreamReader@4290534) )
at com.prof18.rssparser.internal.AndroidXmlParser$parseXML$2.invokeSuspend(AndroidXmlParser.kt:67)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:111)
at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:99)
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:585)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:802)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:706)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:693)
Caused by: org.xmlpull.v1.XmlPullParserException: unresolved: ä (position:TEXT @11:22 in java.io.InputStreamReader@4290534)
at com.android.org.kxml2.io.KXmlParser.checkRelaxed(KXmlParser.java:305)
at com.android.org.kxml2.io.KXmlParser.readEntity(KXmlParser.java:1285)
at com.android.org.kxml2.io.KXmlParser.readValue(KXmlParser.java:1402)
at com.android.org.kxml2.io.KXmlParser.next(KXmlParser.java:393)
at com.android.org.kxml2.io.KXmlParser.next(KXmlParser.java:313)
at com.android.org.kxml2.io.KXmlParser.nextText(KXmlParser.java:2077)
at com.prof18.rssparser.internal.XmlPullParser_Kt.nextTrimmedText(XmlPullParser+.kt:5)
at com.prof18.rssparser.internal.rss.RssParserKt.extractRSSContent(RssParser.kt:289)
at com.prof18.rssparser.internal.AndroidXmlParser$parseXML$2.invokeSuspend(AndroidXmlParser.kt:54)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) 
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104) 
at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:111) 
at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:99) 
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:585) 
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:802) 
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:706) 
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:693) 

ios:

0   composeui                           0x10e50c5d7        kfun:kotlin.Throwable#<init>(){} + 95 (/opt/buildAgent/work/b2e1db4d8d903ca4/kotlin/kotlin-native/runtime/src/main/kotlin/kotlin/Throwable.kt:32:28)
1   composeui                           0x10e50589f        kfun:kotlin.Exception#<init>(){} + 87 (/opt/buildAgent/work/b2e1db4d8d903ca4/kotlin/kotlin-native/runtime/src/main/kotlin/kotlin/Exceptions.kt:21:35)
2   composeui                           0x110063c33        kfun:com.prof18.rssparser.exception.RssParsingException#<init>(kotlin.String?;kotlin.Throwable?){} + 107 (/Users/runner/work/RSS-Parser/RSS-Parser/rssparser/src/commonMain/kotlin/com/prof18/rssparser/exception/RssParsingException.kt:12:5)
3   composeui                           0x11008ed37        kfun:com.prof18.rssparser.internal.IosXmlParser.parseXML$lambda$3$lambda$1#internal + 299 (/Users/runner/work/RSS-Parser/RSS-Parser/rssparser/src/iosMain/kotlin/com/prof18/rssparser/internal/IosXmlParser.kt:32:33)
4   composeui                           0x11008fc37        kfun:com.prof18.rssparser.internal.IosXmlParser.$parseXML$lambda$3$lambda$1$FUNCTION_REFERENCE$2.invoke#internal + 103 (/Users/runner/work/RSS-Parser/RSS-Parser/rssparser/src/iosMain/kotlin/com/prof18/rssparser/internal/IosXmlParser.kt:26:13)

The link of the RSS Feed
https://myrskyla.fi/feed/

I was able to fix it by replacing this (and some more likely offending chars http://www.javascripter.net/faq/accentedcharacters.htm) manually:

val feedString = xmlFetcher.fetchXmlAsString(url)
val feedStringFixed = feedString
            .replace("& auml;", "Ä")
            .replace("& Ouml;", "Ö")
val channel = parser.parse(feedStringFixed)

But i needed to fetch the feed myself because built-in XmlFetcher is internal class.
So would be good to

  1. try unescaping chars if parsing fails or/and making XmlFetcher interface accessible
  2. add possibility to override or use XmlFetcher.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions