@@ -1197,48 +1197,49 @@ In addition to the examples below, more examples are given in
11971197:ref: `urllib-howto `.
11981198
11991199This example gets the python.org main page and displays the first 300 bytes of
1200- it. ::
1200+ it::
12011201
12021202 >>> import urllib.request
12031203 >>> with urllib.request.urlopen('http://www.python.org/') as f:
12041204 ... print(f.read(300))
12051205 ...
1206- b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1207- "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
1208- xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n
1209- <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n
1210- <title>Python Programming '
1206+ b'<!doctype html>\n<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->\n<!--[if IE 7]> <html class="no-js ie7 lt-ie8 lt-ie9"> <![endif]-->\n<!--[if IE 8]> <html class="no-js ie8 lt-ie9">
12111207
12121208Note that urlopen returns a bytes object. This is because there is no way
12131209for urlopen to automatically determine the encoding of the byte stream
12141210it receives from the HTTP server. In general, a program will decode
12151211the returned bytes object to string once it determines or guesses
12161212the appropriate encoding.
12171213
1218- The following W3C document, https://www.w3. org/International/O- charset\ , lists
1219- the various ways in which an (X) HTML or an XML document could have specified its
1214+ The following HTML spec document, https://html.spec.whatwg. org/# charset, lists
1215+ the various ways in which an HTML or an XML document could have specified its
12201216encoding information.
12211217
1218+ For additional information, see the W3C document: https://www.w3.org/International/questions/qa-html-encoding-declarations.
1219+
12221220As the python.org website uses *utf-8 * encoding as specified in its meta tag, we
1223- will use the same for decoding the bytes object. ::
1221+ will use the same for decoding the bytes object::
12241222
12251223 >>> with urllib.request.urlopen('http://www.python.org/') as f:
12261224 ... print(f.read(100).decode('utf-8'))
12271225 ...
1228- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1229- "http://www.w3.org/TR/xhtml1/DTD/xhtm
1226+ <!doctype html>
1227+ <!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->
1228+ <!-
12301229
12311230It is also possible to achieve the same result without using the
1232- :term: `context manager ` approach. ::
1231+ :term: `context manager ` approach::
12331232
12341233 >>> import urllib.request
12351234 >>> f = urllib.request.urlopen('http://www.python.org/')
12361235 >>> try:
12371236 ... print(f.read(100).decode('utf-8'))
12381237 ... finally:
12391238 ... f.close()
1240- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1241- "http://www.w3.org/TR/xhtml1/DTD/xhtm
1239+ ...
1240+ <!doctype html>
1241+ <!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->
1242+ <!--
12421243
12431244In the following example, we are sending a data-stream to the stdin of a CGI
12441245and reading the data it returns to us. Note that this example will only work
0 commit comments