Skip to content

add timeout support for sbcl & Add support to encode unicode characters in uri path#132

Open
jingtaozf wants to merge 6 commits intoedicl:masterfrom
jingtaozf:master
Open

add timeout support for sbcl & Add support to encode unicode characters in uri path#132
jingtaozf wants to merge 6 commits intoedicl:masterfrom
jingtaozf:master

Conversation

@jingtaozf
Copy link
Copy Markdown

(with-output-to-string (*standard-output*)
(loop for c across uri-string
if (> (char-code c) 255)
;; It's not a latin-1 character, so we need to encode it.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URLs must only contain US-ASCII characters, everything else must be encoded.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drakma raised an exception when encountering URLs with Unicode characters in the path or the query parameters. To make URLs more accessible for non-English users, many websites have tried to incorporate Unicode characters in these sections of the URLs, even the HTTP protocol says a URL only contain US-ASCII characters.

I wonder whether we need to support it inside Drakma, if not, I'll try to revert related code change.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @jingtaozf,

I understand what you're trying to accomplish. What I mean to say is that in the encoded URL, only US-ASCII characters are permitted, but you're checking for (> (char-code c) 255), which would pass non-US-ASCII characters as well. There also is the issue of determining the correct encoding for those characters. Nowadays, UTF-8 can mostly be assumed, but some web servers may actually try to use the Content-Type to determine the encoding. Some experimentation will be needed, I think.

In any case, I'd recommend that you check for (> (char-code c) 126) and encode using percent encoding using UTF-8.

-Hans

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants