feat: add one more lesson

honzajavorek · honzajavorek · commit 8293af88da2d · 2024-09-10T09:30:35.000+02:00
diff --git a/sources/academy/webscraping/scraping_basics_python/04_downloading_html.md b/sources/academy/webscraping/scraping_basics_python/04_downloading_html.md
@@ -30,7 +30,7 @@ Being comfortable around Python project setup and installing packages is a prere
 
 Now let's test that all works. Inside the project directory create a new file called `main.py` with the following code:
 
-```python
+```py
 import httpx
 
 print("OK")
@@ -53,7 +53,7 @@ If you see errors or for any other reason cannot run the code above, we're sorry
 
 Now onto coding! Let's change our code so it downloads HTML of the product listing instead of printing OK. The [documentation of the HTTPX library](https://www.python-httpx.org/) provides us with examples how to use it. Inspired by those, our code will look like this:
 
-```python
+```py
 import httpx
 
 url = "https://warehouse-theme-metal.myshopify.com/collections/sales"
@@ -106,7 +106,7 @@ Sometimes websites return all kinds of errors. Most often because:
 
 In HTTP, each response has a three-digit _status code_, which tells us whether it's an error or success. Let's change the last line of our program to print the code of the response we get:
 
-```python
+```py
 print(response.status_code)
 ```
 
@@ -140,7 +140,7 @@ A robust scraper skips or retries requests when errors occur, but let's start si
 
 We also want to play along with the conventions of the operating system, so we'll print to the [standard error output](https://en.wikipedia.org/wiki/Standard_streams#Standard_error_(stderr)) and exit our program with a non-zero [status code](https://en.wikipedia.org/wiki/Exit_status):
 
-```python
+```py
 import sys
 import httpx
 
@@ -182,7 +182,7 @@ https://www.amazon.com/s?k=darth+vader
 <details>
   <summary>Solution</summary>
 
-  ```python
+  ```py
   import sys
   import httpx
 
@@ -218,7 +218,7 @@ https://warehouse-theme-metal.myshopify.com/collections/sales
 
   If you want to use Python instead, it offers several ways how to create files. The solution below uses [pathlib](https://docs.python.org/3/library/pathlib.html):
 
-  ```python
+  ```py
   import sys
   import httpx
   from pathlib import Path
@@ -249,7 +249,7 @@ https://warehouse-theme-metal.myshopify.com/cdn/shop/products/sonyxbr55front_f72
 
   Python offers several ways how to create files. The solution below uses [pathlib](https://docs.python.org/3/library/pathlib.html):
 
-  ```python
+  ```py
   from pathlib import Path
   import sys
   import httpx
diff --git a/sources/academy/webscraping/scraping_basics_python/05_parsing_html.md b/sources/academy/webscraping/scraping_basics_python/05_parsing_html.md
@@ -37,7 +37,7 @@ At first sight, counting `product-item` occurances wouldn't match only products,
 
 We could try looking for `<div class="product-item`, a substring which represents the enitre beginning of each product tag, but that would also count `<div class="product-item__info`! We'll need to add a space after the class name to avoid matching those. Replace your program with the following code:
 
-```python
+```py
 import httpx
 
 url = "https://warehouse-theme-metal.myshopify.com/collections/sales"
@@ -92,7 +92,7 @@ Now let's use it for parsing the HTML. Unlike plain string, the `BeautifulSoup`
 
 Update your code to the following:
 
-```python
+```py
 import httpx
 from bs4 import BeautifulSoup
 
@@ -114,7 +114,7 @@ $ python main.py
 
 Our code lists all `<h1>` tags it can find on the page. It's the case that there's just one, so in the result we can see a list with a single item. What if we want to print just the text? Let's change the end of the program to the following:
 
-```python
+```py
 headings = soup.select("h1")
 first_heading = headings[0]
 print(first_heading.text)
@@ -133,7 +133,7 @@ Beautiful Soup's `.select()` method runs a _CSS selector_ against a parsed HTML
 
 Scanning through [usage examples](https://beautiful-soup-4.readthedocs.io/en/latest/#css-selectors) will help us to figure out code for counting the product cards:
 
-```python
+```py
 import httpx
 from bs4 import BeautifulSoup
 
@@ -173,7 +173,7 @@ https://www.formula1.com/en/teams
 <details>
   <summary>Solution</summary>
 
-  ```python
+  ```py
   import httpx
   from bs4 import BeautifulSoup
 
@@ -195,7 +195,7 @@ Use the same URL as in the previous exercise, but this time print a total count
 <details>
   <summary>Solution</summary>
 
-  ```python
+  ```py
   import httpx
   from bs4 import BeautifulSoup
 
diff --git a/sources/academy/webscraping/scraping_basics_python/06_locating_elements.md b/sources/academy/webscraping/scraping_basics_python/06_locating_elements.md
@@ -12,7 +12,7 @@ slug: /scraping-basics-python/locating-elements
 
 In the previous lesson we've managed to print text of the page's main heading or count how many products is in the listing. Let's combine those two—what happens if we print `.text` for each product card?
 
-```python
+```py
 import httpx
 from bs4 import BeautifulSoup
 
@@ -62,7 +62,7 @@ As in the browser DevTools lessons, we need to change the code so that it locate
 
 We should be looking for elements which have the `product-item__title` and `price` classes. We already know how that translates to CSS selectors:
 
-```python
+```py
 import httpx
 from bs4 import BeautifulSoup
 
@@ -73,10 +73,10 @@ response.raise_for_status()
 html_code = response.text
 soup = BeautifulSoup(html_code, "html.parser")
 for product in soup.select(".product-item"):
-    titles = product.select('.product-item__title')
+    titles = product.select(".product-item__title")
     first_title = titles[0].text
 
-    prices = product.select('.price')
+    prices = product.select(".price")
     first_price = prices[0].text
 
     print(first_title, first_price)
@@ -103,7 +103,7 @@ There's still some room for improvement, but it's already much better!
 
 Often, we want to assume in our code that a certain element exists only once. It's a bit tedious to work with lists when you know you're looking for a single element. For this purpose, Beautiful Soup offers a `.select_one()` method. Like `document.querySelector()` in browser DevTools, it returns just one result or none. Let's simplify our code!
 
-```python
+```py
 import httpx
 from bs4 import BeautifulSoup
 
@@ -114,8 +114,8 @@ response.raise_for_status()
 html_code = response.text
 soup = BeautifulSoup(html_code, "html.parser")
 for product in soup.select(".product-item"):
-    title = product.select_one('.product-item__title').text
-    price = product.select_one('.price').text
+    title = product.select_one(".product-item__title").text
+    price = product.select_one(".price").text
     print(title, price)
 ```
 
@@ -131,7 +131,7 @@ In the output we can see that the price isn't located precisely. For each produc
   $74.95
 </span>
 ```
-When translated to a tree of Python objects, the element with class `price` will contain several nodes:
+When translated to a tree of Python objects, the element with class `price` will contain several _nodes_:
 
 - Textual node with white space,
 - a `span` HTML element,
@@ -140,12 +140,12 @@ When translated to a tree of Python objects, the element with class `price` will
 We can use Beautiful Soup's `.contents` property to access individual nodes. It returns a list of nodes like this:
 
 ```
-['\n', <span class="visually-hidden">Sale price</span>, '$74.95']
+["\n", <span class="visually-hidden">Sale price</span>, "$74.95"]
 ```
 
 It seems like we can read the last element to get the actual amount from a list like the above. Let's fix our program:
 
-```python
+```py
 import httpx
 from bs4 import BeautifulSoup
 
@@ -156,12 +156,12 @@ response.raise_for_status()
 html_code = response.text
 soup = BeautifulSoup(html_code, "html.parser")
 for product in soup.select(".product-item"):
-    title = product.select_one('.product-item__title').text
-    price = product.select_one('.price').contents[-1]
+    title = product.select_one(".product-item__title").text
+    price = product.select_one(".price").contents[-1]
     print(title, price)
 ```
 
-If we run our program now, it should print prices just as the actual amounts:
+If we run the scraper now, it should print prices as only amounts:
 
 ```text
 $ python main.py
@@ -173,3 +173,11 @@ Sony PS-HX500 Hi-Res USB Turntable $398.00
 ```
 
 Great! We have managed to use CSS selectors and walk the HTML tree to get a list of product titles and prices. But wait a second—what's `From $1,398.00`? One does not simply scrape a price! We'll need to clean that. But that's a job for the next lesson, which is about extracting data.
+
+---
+
+## Exercises
+
+These challenges are here to help you test what you’ve learned in this lesson. Try to resist the urge to peek at the solutions right away. Remember, the best learning happens when you dive in and do it yourself!
+
+TODO
diff --git a/sources/academy/webscraping/scraping_basics_python/07_extracting_data.md b/sources/academy/webscraping/scraping_basics_python/07_extracting_data.md