Skip to content

Commit 712c20e

Browse files
authored
Merge pull request #195 from realpython/fix_bs4
Fix bs4
2 parents 7010df1 + b2656a6 commit 712c20e

File tree

4 files changed

+25
-118
lines changed

4 files changed

+25
-118
lines changed

web-scraping-bs4/README.md

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,3 @@
11
# Build a Web Scraper With Requests and Beautiful Soup
22

3-
This repository contains code relating to the Real Python tutorial on how to [Build a Web Scraper With Requests and Beautiful Soup](https://realpython.com/beautiful-soup-web-scraper-python/).
4-
5-
There are two available scripts:
6-
7-
1. **[`scrape_jobs.py`](https://github.com/realpython/materials/blob/master/web-scraping-bs4/scrape_jobs.py):** The sample script that you build throughout the tutorial
8-
2. **[`job_search.py`](https://github.com/realpython/materials/blob/master/web-scraping-bs4/job_search.py):** The final code expanded as a command-line-interface app
3+
This repository contains [`scrape_jobs.py`](https://github.com/realpython/materials/blob/master/web-scraping-bs4/scrape_jobs.py), which is the sample script built in the Real Python tutorial on how to [Build a Web Scraper With Requests and Beautiful Soup](https://realpython.com/beautiful-soup-web-scraper-python/).

web-scraping-bs4/job_search.py

Lines changed: 0 additions & 94 deletions
This file was deleted.

web-scraping-bs4/requirements.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
beautifulsoup4==4.9.3
2+
certifi==2020.12.5
3+
chardet==4.0.0
4+
idna==2.10
5+
requests==2.25.1
6+
soupsieve==2.2.1
7+
urllib3==1.26.4

web-scraping-bs4/scrape_jobs.py

Lines changed: 17 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2,29 +2,28 @@
22
from bs4 import BeautifulSoup
33

44

5-
URL = "https://www.monster.com/jobs/search/?q=Software-Developer\
6-
&where=Australia"
5+
URL = "https://realpython.github.io/fake-jobs/"
76
page = requests.get(URL)
87

98
soup = BeautifulSoup(page.content, "html.parser")
109
results = soup.find(id="ResultsContainer")
1110

1211
# Look for Python jobs
13-
python_jobs = results.find_all("h2", string=lambda t: "python" in t.lower())
14-
for p_job in python_jobs:
15-
link = p_job.find("a")["href"]
16-
print(p_job.text.strip())
17-
print(f"Apply here: {link}\n")
12+
print("PYTHON JOBS\n==============================\n")
13+
python_jobs = results.find_all(
14+
"h2", string=lambda text: "python" in text.lower()
15+
)
16+
python_job_elements = [
17+
h2_element.parent.parent.parent for h2_element in python_jobs
18+
]
1819

19-
# Print out all available jobs from the scraped webpage
20-
job_elems = results.find_all("section", class_="card-content")
21-
for job_elem in job_elems:
22-
title_elem = job_elem.find("h2", class_="title")
23-
company_elem = job_elem.find("div", class_="company")
24-
location_elem = job_elem.find("div", class_="location")
25-
if None in (title_elem, company_elem, location_elem):
26-
continue
27-
print(title_elem.text.strip())
28-
print(company_elem.text.strip())
29-
print(location_elem.text.strip())
20+
for job_element in python_job_elements:
21+
title_element = job_element.find("h2", class_="title")
22+
company_element = job_element.find("h3", class_="company")
23+
location_element = job_element.find("p", class_="location")
24+
print(title_element.text.strip())
25+
print(company_element.text.strip())
26+
print(location_element.text.strip())
27+
link_url = job_element.find_all("a")[1]["href"]
28+
print(f"Apply here: {link_url}\n")
3029
print()

0 commit comments

Comments
 (0)