Skip to content

Commit 82bf3ee

Browse files
committed
fix: change order of JSON and CSV, fix some small errors
1 parent 861c613 commit 82bf3ee

File tree

1 file changed

+17
-17
lines changed

1 file changed

+17
-17
lines changed

sources/academy/webscraping/scraping_basics_python/09_getting_links.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,20 @@ def parse_product(product):
115115
return {"title": title, "min_price": min_price, "price": price}
116116
```
117117

118-
Now the CSV export. We'll make a small change here. Having to specify the field names is not ideal. What if we add more field names in the parsing function? We'd always have to remember to go and edit the export function as well. If we could figure out the field names in place, we'd remove this dependency. One way would be to infer the field names from the dictionary keys of the first row:
118+
Now the JSON export. For better readability of it, let's make a small change here and set the indentation level to two spaces:
119+
120+
```py
121+
def export_json(file, data):
122+
def serialize(obj):
123+
if isinstance(obj, Decimal):
124+
return str(obj)
125+
raise TypeError("Object not JSON serializable")
126+
127+
# highlight-next-line
128+
json.dump(data, file, default=serialize, indent=2)
129+
```
130+
131+
The last function we'll add will take care of the CSV export. We'll make a small change here as well. Having to specify the field names is not ideal. What if we add more field names in the parsing function? We'd always have to remember to go and edit the export function as well. If we could figure out the field names in place, we'd remove this dependency. One way would be to infer the field names from the dictionary keys of the first row:
119132

120133
```py
121134
def export_csv(file, data):
@@ -133,19 +146,6 @@ The code above assumes the `data` variable contains at least one item, and that
133146

134147
:::
135148

136-
The last function we'll add will take care of the JSON export. For better readability of the JSON export, let's make a small change here too and set the indentation level to two spaces:
137-
138-
```py
139-
def export_json(file, data):
140-
def serialize(obj):
141-
if isinstance(obj, Decimal):
142-
return str(obj)
143-
raise TypeError("Object not JSON serializable")
144-
145-
# highlight-next-line
146-
json.dump(data, file, default=serialize, indent=2)
147-
```
148-
149149
Now let's put it all together:
150150

151151
```py
@@ -406,16 +406,16 @@ https://www.theguardian.com/sport/article/2024/sep/02/max-verstappen-damns-his-u
406406
from bs4 import BeautifulSoup
407407
from urllib.parse import urljoin
408408

409-
url = "https://www.theguardian.com/sport/formulaone"
410-
response = httpx.get(url)
409+
listing_url = "https://www.theguardian.com/sport/formulaone"
410+
response = httpx.get(listing_url)
411411
response.raise_for_status()
412412

413413
html_code = response.text
414414
soup = BeautifulSoup(html_code, "html.parser")
415415

416416
for item in soup.select("#maincontent ul li"):
417417
link = item.select_one("a")
418-
url = urljoin(url, link["href"])
418+
url = urljoin(listing_url, link["href"])
419419
print(url)
420420
```
421421

0 commit comments

Comments
 (0)