-
Notifications
You must be signed in to change notification settings - Fork 13
Description
File parsing fails with http://www.aluesarjat.fi/database/aluesarjat_kaupunkiverkko/vaesto_sal/vaestoennusteet_sal/b01esps_vaestoennuste.px
The problem is due to it having weird stuff in NOTE-field, in format
NOTE="Some stuff";
"Some stuff more";
"and even more stuff";
from which the reader tries to create new attribute named 'Some stuff more', but fails due to there being non-ascii characters. Fixing would need rewriting the whole metadata tokenizing code.
Seems that the different "statements" are meant to split paragraphs. Don't know if the "spec" (if there is one) allows this ugliness. At least PX-Web seems to eat it: http://www.aluesarjat.fi/graph/Footnote.aspx?File=B01ESPS_Vaestoennuste&path=..%2fDATABASE%2fALUESARJAT_KAUPUNKIVERKKO%2fVAESTO_SAL%2fVAESTOENNUSTEET_SAL%2f&ti=Espoon+v%C3%A4est%C3%B6+1.1.1999-2013+ja+v%C3%A4est%C3%B6ennuste+1.1.2014+-+2023&case=db&ssid=1403051945183&Gedit=false
Fails with both ";-tokenizer" implementations.