apify
diff --git a/‎sources/academy/webscraping/scraping_basics_python/07_extracting_data.md‎
Lines changed: 21 additions & 6 deletions b/‎sources/academy/webscraping/scraping_basics_python/07_extracting_data.md‎
Lines changed: 21 additions & 6 deletions
diff --git a/‎sources/academy/webscraping/scraping_basics_python/08_saving_data.md‎
Lines changed: 9 additions & 39 deletions b/‎sources/academy/webscraping/scraping_basics_python/08_saving_data.md‎
Lines changed: 9 additions & 39 deletions
diff --git a/‎sources/academy/webscraping/scraping_basics_python/09_getting_links.md‎
Lines changed: 18 additions & 33 deletions b/‎sources/academy/webscraping/scraping_basics_python/09_getting_links.md‎
Lines changed: 18 additions & 33 deletions
diff --git a/‎sources/academy/webscraping/scraping_basics_python/10_crawling.md‎
Lines changed: 7 additions & 12 deletions b/‎sources/academy/webscraping/scraping_basics_python/10_crawling.md‎
Lines changed: 7 additions & 12 deletions
@@ -159,12 +159,26 @@ Great! Only if we didn't overlook an important pitfall called [floating-point er
 0.30000000000000004
 ```
 
-These errors are small and usually don't matter, but sometimes they can add up and cause unpleasant discrepancies. That's why it's typically best to avoid floating point numbers when working with money. Let's instead use Python's built-in [`Decimal()`](https://docs.python.org/3/library/decimal.html) type:
+These errors are small and usually don't matter, but sometimes they can add up and cause unpleasant discrepancies. That's why it's typically best to avoid floating point numbers when working with money. We won't store dollars, but cents:
+
+```py
+price_text = (
+    product
+    .select_one(".price")
+    .contents[-1]
+    .strip()
+    .replace("$", "")
+# highlight-next-line
+    .replace(".", "")
+    .replace(",", "")
+)
+```
+
+In this case, removing the dot from the price text is the same as if we multiplied all the numbers with 100, effectively converting dollars to cents. For converting the text to a number we'll use `int()` instead of `float()`. This is how the whole program looks like now:
 
 ```py
 import httpx
 from bs4 import BeautifulSoup
-from decimal import Decimal
 
 url = "https://warehouse-theme-metal.myshopify.com/collections/sales"
 response = httpx.get(url)
@@ -182,13 +196,14 @@ for product in soup.select(".product-item"):
         .contents[-1]
         .strip()
         .replace("$", "")
+        .replace(".", "")
         .replace(",", "")
     )
     if price_text.startswith("From "):
-        min_price = Decimal(price_text.removeprefix("From "))
+        min_price = int(price_text.removeprefix("From "))
         price = None
     else:
-        min_price = Decimal(price_text)
+        min_price = int(price_text)
         price = min_price
 
     print(title, min_price, price, sep=" | ")
@@ -198,8 +213,8 @@ If we run the code above, we have nice, clean data about all the products!
 
 ```text
 $ python main.py
-JBL Flip 4 Waterproof Portable Bluetooth Speaker | 74.95 | 74.95
-Sony XBR-950G BRAVIA 4K HDR Ultra HD TV | 1398.00 | None
+JBL Flip 4 Waterproof Portable Bluetooth Speaker | 7495 | 7495
+Sony XBR-950G BRAVIA 4K HDR Ultra HD TV | 139800 | None
 ...
 ```
 
 
@@ -29,7 +29,6 @@ Producing results line by line is an efficient approach to handling large datase
 ```py
 import httpx
 from bs4 import BeautifulSoup
-from decimal import Decimal
 
 url = "https://warehouse-theme-metal.myshopify.com/collections/sales"
 response = httpx.get(url)
@@ -49,13 +48,14 @@ for product in soup.select(".product-item"):
         .contents[-1]
         .strip()
         .replace("$", "")
+        .replace(".", "")
         .replace(",", "")
     )
     if price_text.startswith("From "):
-        min_price = Decimal(price_text.removeprefix("From "))
+        min_price = int(price_text.removeprefix("From "))
         price = None
     else:
-        min_price = Decimal(price_text)
+        min_price = int(price_text)
         price = min_price
 
     # highlight-next-line
@@ -69,7 +69,7 @@ Before looping over the products, we prepare an empty list. Then, instead of pri
 
 ```text
 $ python main.py
-[{'title': 'JBL Flip 4 Waterproof Portable Bluetooth Speaker', 'min_price': Decimal('74.95'), 'price': Decimal('74.95')}, {'title': 'Sony XBR-950G BRAVIA 4K HDR Ultra HD TV', 'min_price': Decimal('1398.00'), 'price': None}, ...]
+[{'title': 'JBL Flip 4 Waterproof Portable Bluetooth Speaker', 'min_price': 7495, 'price': 7495}, {'title': 'Sony XBR-950G BRAVIA 4K HDR Ultra HD TV', 'min_price': 139800, 'price': None}, ...]
 ```
 
 :::tip Pretty print
@@ -87,7 +87,6 @@ In Python, we can read and write JSON using the [`json`](https://docs.python.org
 ```py
 import httpx
 from bs4 import BeautifulSoup
-from decimal import Decimal
 # highlight-next-line
 import json
 ```
@@ -99,39 +98,17 @@ with open("products.json", "w") as file:
     json.dump(data, file)
 ```
 
-That's it! If we run the program now, it should also create a `products.json` file in the current working directory:
-
-```text
-$ python main.py
-Traceback (most recent call last):
-  ...
-    raise TypeError(f'Object of type {o.__class__.__name__} '
-TypeError: Object of type Decimal is not JSON serializable
-```
-
-Ouch! JSON supports integers and floating-point numbers, but there's no guidance on how to handle `Decimal`. To maintain precision, it's common to store monetary values as strings in JSON files. But this is a convention, not a standard, so we need to handle it manually. We'll pass a custom function to `json.dump()` to serialize objects that it can't handle directly:
-
-```py
-def serialize(obj):
-    if isinstance(obj, Decimal):
-        return str(obj)
-    raise TypeError("Object not JSON serializable")
-
-with open("products.json", "w") as file:
-    json.dump(data, file, default=serialize)
-```
-
-If we run our scraper now, it won't display any output, but it will create a `products.json` file in the current working directory, which contains all the data about the listed products:
+That's it! If we run our scraper now, it won't display any output, but it will create a `products.json` file in the current working directory, which contains all the data about the listed products:
 
 <!-- eslint-skip -->
 ```json title=products.json
-[{"title": "JBL Flip 4 Waterproof Portable Bluetooth Speaker", "min_price": "74.95", "price": "74.95"}, {"title": "Sony XBR-950G BRAVIA 4K HDR Ultra HD TV", "min_price": "1398.00", "price": null}, ...]
+[{"title": "JBL Flip 4 Waterproof Portable Bluetooth Speaker", "min_price": "7495", "price": "7495"}, {"title": "Sony XBR-950G BRAVIA 4K HDR Ultra HD TV", "min_price": "139800", "price": null}, ...]
 ```
 
 If you skim through the data, you'll notice that the `json.dump()` function handled some potential issues, such as escaping double quotes found in one of the titles by adding a backslash:
 
 ```json
-{"title": "Sony SACS9 10\" Active Subwoofer", "min_price": "158.00", "price": "158.00"}
+{"title": "Sony SACS9 10\" Active Subwoofer", "min_price": "15800", "price": "15800"}
 ```
 
 :::tip Pretty JSON
@@ -177,7 +154,6 @@ Now that's nice, but we didn't want Alice, Bob, kickbox, or TypeScript. What we
 ```py
 import httpx
 from bs4 import BeautifulSoup
-from decimal import Decimal
 import json
 # highlight-next-line
 import csv
@@ -186,13 +162,8 @@ import csv
 Next, let's add one more data export to end of the source code of our scraper:
 
 ```py
-def serialize(obj):
-    if isinstance(obj, Decimal):
-        return str(obj)
-    raise TypeError("Object not JSON serializable")
-
 with open("products.json", "w") as file:
-    json.dump(data, file, default=serialize)
+    json.dump(data, file)
 
 with open("products.csv", "w") as file:
     writer = csv.DictWriter(file, fieldnames=["title", "min_price", "price"])
@@ -223,13 +194,12 @@ Write a new Python program that reads `products.json`, finds all products with a
   ```py
   import json
   from pprint import pp
-  from decimal import Decimal
 
   with open("products.json", "r") as file:
       products = json.load(file)
 
   for product in products:
-      if Decimal(product["min_price"]) > 500:
+      if int(product["min_price"]) > 500:
           pp(product)
   ```
 
 
@@ -33,7 +33,6 @@ Over the course of the previous lessons, the code of our program grew to almost
 ```py
 import httpx
 from bs4 import BeautifulSoup
-from decimal import Decimal
 import json
 import csv
 
@@ -54,24 +53,20 @@ for product in soup.select(".product-item"):
         .contents[-1]
         .strip()
         .replace("$", "")
+        .replace(".", "")
         .replace(",", "")
     )
     if price_text.startswith("From "):
-        min_price = Decimal(price_text.removeprefix("From "))
+        min_price = int(price_text.removeprefix("From "))
         price = None
     else:
-        min_price = Decimal(price_text)
+        min_price = int(price_text)
         price = min_price
 
     data.append({"title": title, "min_price": min_price, "price": price})
 
-def serialize(obj):
-    if isinstance(obj, Decimal):
-        return str(obj)
-    raise TypeError("Object not JSON serializable")
-
 with open("products.json", "w") as file:
-    json.dump(data, file, default=serialize)
+    json.dump(data, file)
 
 with open("products.csv", "w") as file:
     writer = csv.DictWriter(file, fieldnames=["title", "min_price", "price"])
@@ -103,13 +98,14 @@ def parse_product(product):
         .contents[-1]
         .strip()
         .replace("$", "")
+        .replace(".", "")
         .replace(",", "")
     )
     if price_text.startswith("From "):
-        min_price = Decimal(price_text.removeprefix("From "))
+        min_price = int(price_text.removeprefix("From "))
         price = None
     else:
-        min_price = Decimal(price_text)
+        min_price = int(price_text)
         price = min_price
 
     return {"title": title, "min_price": min_price, "price": price}
@@ -119,13 +115,8 @@ Now the JSON export. For better readability of it, let's make a small change her
 
 ```py
 def export_json(file, data):
-    def serialize(obj):
-        if isinstance(obj, Decimal):
-            return str(obj)
-        raise TypeError("Object not JSON serializable")
-
     # highlight-next-line
-    json.dump(data, file, default=serialize, indent=2)
+    json.dump(data, file, indent=2)
 ```
 
 The last function we'll add will take care of the CSV export. We'll make a small change here as well. Having to specify the field names is not ideal. What if we add more field names in the parsing function? We'd always have to remember to go and edit the export function as well. If we could figure out the field names in place, we'd remove this dependency. One way would be to infer the field names from the dictionary keys of the first row:
@@ -151,7 +142,6 @@ Now let's put it all together:
 ```py
 import httpx
 from bs4 import BeautifulSoup
-from decimal import Decimal
 import json
 import csv
 
@@ -171,24 +161,20 @@ def parse_product(product):
         .contents[-1]
         .strip()
         .replace("$", "")
+        .replace(".", "")
         .replace(",", "")
     )
     if price_text.startswith("From "):
-        min_price = Decimal(price_text.removeprefix("From "))
+        min_price = int(price_text.removeprefix("From "))
         price = None
     else:
-        min_price = Decimal(price_text)
+        min_price = int(price_text)
         price = min_price
 
     return {"title": title, "min_price": min_price, "price": price}
 
 def export_json(file, data):
-    def serialize(obj):
-        if isinstance(obj, Decimal):
-            return str(obj)
-        raise TypeError("Object not JSON serializable")
-
-    json.dump(data, file, default=serialize, indent=2)
+    json.dump(data, file, indent=2)
 
 def export_csv(file, data):
     fieldnames = list(data[0].keys())
@@ -254,13 +240,13 @@ In the previous code example, we've also added the URL to the dictionary returne
 [
   {
     "title": "JBL Flip 4 Waterproof Portable Bluetooth Speaker",
-    "min_price": "74.95",
-    "price": "74.95",
+    "min_price": "7495",
+    "price": "7495",
     "url": "/products/jbl-flip-4-waterproof-portable-bluetooth-speaker"
   },
   {
     "title": "Sony XBR-950G BRAVIA 4K HDR Ultra HD TV",
-    "min_price": "1398.00",
+    "min_price": "139800",
     "price": null,
     "url": "/products/sony-xbr-65x950g-65-class-64-5-diag-bravia-4k-hdr-ultra-hd-tv"
   },
@@ -277,7 +263,6 @@ Browsers reading the HTML know the base address and automatically resolve such l
 ```py
 import httpx
 from bs4 import BeautifulSoup
-from decimal import Decimal
 import json
 import csv
 # highlight-next-line
@@ -319,13 +304,13 @@ When we run the scraper now, we should see full URLs in our exports:
 [
   {
     "title": "JBL Flip 4 Waterproof Portable Bluetooth Speaker",
-    "min_price": "74.95",
-    "price": "74.95",
+    "min_price": "7495",
+    "price": "7495",
     "url": "https://warehouse-theme-metal.myshopify.com/products/jbl-flip-4-waterproof-portable-bluetooth-speaker"
   },
   {
     "title": "Sony XBR-950G BRAVIA 4K HDR Ultra HD TV",
-    "min_price": "1398.00",
+    "min_price": "139800",
     "price": null,
     "url": "https://warehouse-theme-metal.myshopify.com/products/sony-xbr-65x950g-65-class-64-5-diag-bravia-4k-hdr-ultra-hd-tv"
   },
 
@@ -18,7 +18,6 @@ Thanks to the refactoring, we have functions ready for each of the tasks, so we
 ```py
 import httpx
 from bs4 import BeautifulSoup
-from decimal import Decimal
 import json
 import csv
 from urllib.parse import urljoin
@@ -41,24 +40,20 @@ def parse_product(product, base_url):
         .contents[-1]
         .strip()
         .replace("$", "")
+        .replace(".", "")
         .replace(",", "")
     )
     if price_text.startswith("From "):
-        min_price = Decimal(price_text.removeprefix("From "))
+        min_price = int(price_text.removeprefix("From "))
         price = None
     else:
-        min_price = Decimal(price_text)
+        min_price = int(price_text)
         price = min_price
 
     return {"title": title, "min_price": min_price, "price": price, "url": url}
 
 def export_json(file, data):
-    def serialize(obj):
-        if isinstance(obj, Decimal):
-            return str(obj)
-        raise TypeError("Object not JSON serializable")
-
-    json.dump(data, file, default=serialize, indent=2)
+    json.dump(data, file, indent=2)
 
 def export_csv(file, data):
     fieldnames = list(data[0].keys())
@@ -159,14 +154,14 @@ If we run the program now, it'll take longer to finish since it's making 24 more
 [
   {
     "title": "JBL Flip 4 Waterproof Portable Bluetooth Speaker",
-    "min_price": "74.95",
-    "price": "74.95",
+    "min_price": "7495",
+    "price": "7495",
     "url": "https://warehouse-theme-metal.myshopify.com/products/jbl-flip-4-waterproof-portable-bluetooth-speaker",
     "vendor": "JBL"
   },
   {
     "title": "Sony XBR-950G BRAVIA 4K HDR Ultra HD TV",
-    "min_price": "1398.00",
+    "min_price": "139800",
     "price": null,
     "url": "https://warehouse-theme-metal.myshopify.com/products/sony-xbr-65x950g-65-class-64-5-diag-bravia-4k-hdr-ultra-hd-tv",
     "vendor": "Sony"