Skip to content

Commit 96b47e9

Browse files
committed
feat: cut down on explaining HTTP
1 parent 61b2158 commit 96b47e9

File tree

1 file changed

+7
-11
lines changed

1 file changed

+7
-11
lines changed

sources/academy/webscraping/scraping_basics_python/04_downloading_html.md

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ If you see errors or for any other reason cannot run the code above, it means th
5353

5454
## Downloading product listing
5555

56-
Now onto coding! Let's change our code so it downloads HTML of the product listing instead of printing OK. The [documentation of the HTTPX library](https://www.python-httpx.org/) provides us with examples how to use it. Inspired by those, our code will look like this:
56+
Now onto coding! Let's change our code so it downloads HTML of the product listing instead of printing `OK`. The [documentation of the HTTPX library](https://www.python-httpx.org/) provides us with examples how to use it. Inspired by those, our code will look like this:
5757

5858
```py
5959
import httpx
@@ -81,19 +81,15 @@ $ python main.py
8181
</html>
8282
```
8383

84-
And that's it! It's not particularly useful yet, but it's a good start of our scraper.
84+
Running `httpx.get(url)`, we made a HTTP request and received a response. It's not particularly useful yet, but it's a good start of our scraper.
8585

86-
## About HTTP
86+
:::tip Client and server, request and response
8787

88-
Running `httpx.get(url)`, we made our first HTTP request and received our first response. HTTP is a network protocol powering most of the internet. Understanding it well is an important foundation for successful scraping, but for now it's enough to know the basic flow and terminology.
88+
HTTP is a network protocol powering the internet. Understanding it well is an important foundation for successful scraping, but for this course, it's enough to know just the basic flow and terminology:
8989

90-
HTTP is an exchange of two participants. The _client_ sends a _request_ to the _server_, which replies with a _response_. In our case, `main.py` is the client, and the technology running at `warehouse-theme-metal.myshopify.com` replies to our request as the server.
91-
92-
<!-- TODO image basic HTTP chart -->
93-
94-
:::tip Deep dive to HTTP
95-
96-
The HTTP protocol is defined by several documents called RFCs, such as [RFC 7230: HTTP Message Syntax and Routing](https://www.rfc-editor.org/rfc/rfc7230) or [RFC 7231: HTTP Semantics and Content](https://www.rfc-editor.org/rfc/rfc7231). While these technical specifications are surprisingly digestible, you may also like [HTTP tutorials by MDN](https://developer.mozilla.org/en-US/docs/Web/HTTP).
90+
- HTTP is an exchange between two participants.
91+
- The _client_ sends a _request_ to the _server_, which replies with a _response_.
92+
- In our case, `main.py` is the client, and the technology running at `warehouse-theme-metal.myshopify.com` replies to our request as the server.
9793

9894
:::
9995

0 commit comments

Comments
 (0)