@@ -125,6 +125,11 @@ The :mod:`urlparse` module defines the following functions:
125125 decomposed before parsing, or is not a Unicode string, no error will be
126126 raised.
127127
128+ .. warning ::
129+
130+ :func: `urlparse ` does not perform validation. See :ref: `URL parsing
131+ security <url-parsing-security>` for details.
132+
128133 .. versionchanged :: 2.5
129134 Added attributes to return value.
130135
@@ -248,6 +253,10 @@ The :mod:`urlparse` module defines the following functions:
248253 decomposed before parsing, or is not a Unicode string, no error will be
249254 raised.
250255
256+ Following some of the `WHATWG spec `_ that updates RFC 3986, leading C0
257+ control and space characters are stripped from the URL. ``\n ``,
258+ ``\r `` and tab ``\t `` characters are removed from the URL at any position.
259+
251260 .. versionadded :: 2.2
252261
253262 .. versionchanged :: 2.5
@@ -257,6 +266,9 @@ The :mod:`urlparse` module defines the following functions:
257266 Characters that affect netloc parsing under NFKC normalization will
258267 now raise :exc: `ValueError `.
259268
269+ .. versionchanged :: 2.7.17.8
270+ Leading WHATWG C0 control and space characters are stripped from the URL.
271+
260272
261273.. function :: urlunsplit(parts)
262274
@@ -378,3 +390,14 @@ The following classes provide the implementations of the parse results:
378390
379391 Concrete class for :func: `urlsplit ` results.
380392
393+ .. _url-parsing-security :
394+
395+ URL parsing security
396+ --------------------
397+
398+ The :func: `urlsplit ` and :func: `urlparse ` APIs do not perform **validation ** of
399+ inputs. They may not raise errors on inputs that other applications consider
400+ invalid. They may also succeed on some inputs that might not be considered
401+ URLs elsewhere. Their purpose is for practical functionality rather than
402+ purity.
403+
0 commit comments