-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Description
It looks like Wilbur has a problem with certain Unicode chars in certain circumstances.
Code to reproduce:
- download RDF/XML date from DBPedia:
wget http://dbpedia.org/data/Semantic_Web.rdf
- parse with external format explicitly defined:
(defvar stream (open #P"Semantic_Web.rdf"
:direction :input
:external-format :utf-8))
(setf wilbur:*db*
(wilbur:parse-db-from-stream stream "http://dbpedia.org/page/Semantic_Web"))
Produces error both on CCL and SBCL:
> Error: Cannot decode this: (#\U+30BB #\U+30DE #\U+30F3 #\U+30C6 #\U+30A3 #\U+30C3 #\U+30AF #\U+30FB #\U+30A6 #\U+30A7 #\U+30D6)
> While executing: (:INTERNAL WILBUR::COLLAPSE WILBUR:COLLAPSE-WHITESPACE), in process listener(1).
debugger invoked on a SIMPLE-ERROR in thread
#<THREAD "main thread" RUNNING {AB2F861}>:
Cannot decode this: (#\HANGUL_SYLLABLE_U #\HANGUL_SYLLABLE_KEU
#\HANGUL_SYLLABLE_RA #\HANGUL_SYLLABLE_I
#\HANGUL_SYLLABLE_NA)
(WILBUR:COLLAPSE-WHITESPACE "우크라이나")
But everything works fine if the external format is not specified:
(defvar stream (open #P"Semantic_Web.rdf"
:direction :input))
(setf wilbur:*db*
(wilbur:parse-db-from-stream stream "http://dbpedia.org/page/Semantic_Web"))
Produces:
#<TEMPORARY-PARSER-DB size 157 #x1862A5C6>
That then can be successfully queried.
The problem is even more evident when using flexi-streams.
Metadata
Metadata
Assignees
Labels
No labels