add timeout support for sbcl & Add support to encode unicode characters in uri path#132
add timeout support for sbcl & Add support to encode unicode characters in uri path#132jingtaozf wants to merge 6 commits intoedicl:masterfrom
Conversation
jingtaozf
commented
May 31, 2023
- add timeout support for sbcl.
- Add support to encode unicode characters in uri path.
Merge latest code from edicl/drakma
| (with-output-to-string (*standard-output*) | ||
| (loop for c across uri-string | ||
| if (> (char-code c) 255) | ||
| ;; It's not a latin-1 character, so we need to encode it. |
There was a problem hiding this comment.
URLs must only contain US-ASCII characters, everything else must be encoded.
There was a problem hiding this comment.
Drakma raised an exception when encountering URLs with Unicode characters in the path or the query parameters. To make URLs more accessible for non-English users, many websites have tried to incorporate Unicode characters in these sections of the URLs, even the HTTP protocol says a URL only contain US-ASCII characters.
I wonder whether we need to support it inside Drakma, if not, I'll try to revert related code change.
There was a problem hiding this comment.
Hello @jingtaozf,
I understand what you're trying to accomplish. What I mean to say is that in the encoded URL, only US-ASCII characters are permitted, but you're checking for (> (char-code c) 255), which would pass non-US-ASCII characters as well. There also is the issue of determining the correct encoding for those characters. Nowadays, UTF-8 can mostly be assumed, but some web servers may actually try to use the Content-Type to determine the encoding. Some experimentation will be needed, I think.
In any case, I'd recommend that you check for (> (char-code c) 126) and encode using percent encoding using UTF-8.
-Hans