Skip to content

Commit a6d38bb

Browse files
committed
fix: improve wording
1 parent 4dc5517 commit a6d38bb

File tree

1 file changed

+26
-27
lines changed

1 file changed

+26
-27
lines changed

sources/academy/webscraping/scraping_basics_python/03_devtools_extracting_data.md

Lines changed: 26 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -77,48 +77,47 @@ In the next lesson, we'll start with our Python project. First we'll be figuring
7777

7878
<Exercises />
7979

80-
### Extract the name of the top wiki on Fandom Movies
80+
### Extract the price of IKEA's most expensive artificial plant
8181

82-
On Fandom's [Movies page](https://www.fandom.com/topics/movies), use CSS selectors and HTML elements manipulation in the **Console** to extract the name of the top wiki. Use JavaScript's [`trim()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/trim) method to remove white space from around the name.
83-
84-
![Fandom's Movies page](./images/devtools-exercise-fandom.png)
82+
At IKEA's [Artificial plants & flowers listing](https://www.ikea.com/se/en/cat/artificial-plants-flowers-20492/), use CSS selectors and HTML elements manipulation in the **Console** to extract the price of the most expensive artificial plant (sold in Sweden, as you'll be browsing their Swedish offer). Before opening DevTools, use your judgment to adjust the page to make the task as straightforward as possible. Finally, use JavaScript's [`parseInt()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/parseInt) function to convert the price text into a number.
8583

8684
<details>
8785
<summary>Solution</summary>
8886

89-
1. Open the [Movies page](https://www.fandom.com/topics/movies).
87+
1. Open the [Artificial plants & flowers listing](https://www.ikea.com/se/en/cat/artificial-plants-flowers-20492/).
88+
1. Sort the products by price, from high to low, so the most expensive plant appears first in the listing.
9089
1. Activate the element selection tool in your DevTools.
91-
1. Click on the list item for the top Fandom wiki in the category.
92-
1. Notice that it has a class `topic_explore-wikis__link`.
93-
1. In the **Console**, execute `document.querySelector('.topic_explore-wikis__link')`. It returns element representing the top list item. The selector is apparently used only for the **Top Wikis** list, and because `document.querySelector()` returns the first matching element, we're almost done.
94-
1. In the **Console**, execute `item = document.querySelector('.topic_explore-wikis__link')` to save the element in a variable.
95-
1. In the **Console**, execute `item.textContent.trim()` to get the element's text without white space.
96-
1. At the time of writing, this returns `"Pixar Wiki"`.
90+
1. Click on the price of the first and most expensive plant.
91+
1. Notice that the price is structured into two elements, with the integer separated from the currency, under a class named `plp-price__integer`. This structure is convenient for extracting the value.
92+
1. In the **Console**, execute `document.querySelector('.plp-price__integer')`. This returns the element representing the first price in the listing. Since `document.querySelector()` returns the first matching element, it directly selects the most expensive plant's price.
93+
1. Save the element in a variable by executing `price = document.querySelector('.plp-price__integer')`.
94+
1. Convert the price text into a number by executing `parseInt(price.textContent)`.
95+
1. At the time of writing, this returns `699`, meaning [699 SEK](https://www.google.com/search?q=699%20sek).
9796

9897
</details>
9998

100-
### Extract the price of IKEA's most expensive artificial plant
99+
### Extract the name of the top wiki on Fandom Movies
101100

102-
At IKEA's [Artificial plants & flowers listing](https://www.ikea.com/se/en/cat/artificial-plants-flowers-20492/), use CSS selectors, and HTML elements manipulation in the **Console** to extract the price of the most expensive artificial plant (sold in Sweden, as we'll be browsing their Swedish offer). Before opening DevTools, use your wits to set the page to a state which is most favorable for you to complete the task with the least effort. In the end, use JavaScript's [`parseInt()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/parseInt) to turn the text to a number.
101+
On Fandom's [Movies page](https://www.fandom.com/topics/movies), use CSS selectors and HTML element manipulation in the **Console** to extract the name of the top wiki. Use JavaScript's [`trim()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/trim) method to remove white space around the name.
102+
103+
![Fandom's Movies page](./images/devtools-exercise-fandom.png)
103104

104105
<details>
105106
<summary>Solution</summary>
106107

107-
1. Open the [Artificial plants & flowers listing](https://www.ikea.com/se/en/cat/artificial-plants-flowers-20492/).
108-
1. Sort the products by price, high to low, so that the most expensive plant appears first in the listing.
108+
1. Open the [Movies page](https://www.fandom.com/topics/movies).
109109
1. Activate the element selection tool in your DevTools.
110-
1. Click on the price of the first and most expensive plant.
111-
1. Notice that it has a class `plp-price__integer`. In the markup the price is already structured into two elements, with the integer separate from the currency, which is convenient.
112-
1. In the **Console**, execute `document.querySelector('.plp-price__integer')`. It returns element representing the first list item. The selector is apparently used only inside product cards, and because `document.querySelector()` returns the first matching element, we're almost done.
113-
1. In the **Console**, execute `price = document.querySelector('.plp-price__integer')` to save the element in a variable.
114-
1. In the **Console**, execute `parseInt(price.textContent)` to get the price as a number.
115-
1. At the time of writing, this returns `699`, as in [699 SEK](https://www.google.com/search?q=699%20sek).
110+
1. Click on the list item for the top Fandom wiki in the category.
111+
1. Notice that it has a class `topic_explore-wikis__link`.
112+
1. In the **Console**, execute `document.querySelector('.topic_explore-wikis__link')`. This returns the element representing the top list item. They use the selector only for the **Top Wikis** list, and because `document.querySelector()` returns the first matching element, you're almost done.
113+
1. Save the element in a variable by executing `item = document.querySelector('.topic_explore-wikis__link')`.
114+
1. Get the element's text without extra white space by executing `item.textContent.trim()`. At the time of writing, this returns `"Pixar Wiki"`.
116115

117116
</details>
118117

119118
### Extract details about the first post on Guardian's F1 news
120119

121-
At Guardian's [F1 news page](https://www.theguardian.com/sport/formulaone), use CSS selectors and HTML manipulation in the **Console** to extract details about the first post. Extract its title, lead paragraph, and URL of the photo.
120+
On the Guardian's [F1 news page](https://www.theguardian.com/sport/formulaone), use CSS selectors and HTML manipulation in the **Console** to extract details about the first post. Specifically, extract its title, lead paragraph, and URL of the associated photo.
122121

123122
![F1 news page](./images/devtools-exercise-guardian2.png)
124123

@@ -128,10 +127,10 @@ At Guardian's [F1 news page](https://www.theguardian.com/sport/formulaone), use
128127
1. Open the [F1 news page](https://www.theguardian.com/sport/formulaone).
129128
1. Activate the element selection tool in your DevTools.
130129
1. Click on the first post.
131-
1. Notice that there are no good classes to go by. The markup uses generic tags and randomized classes. We must rely on the hierarchy and order of the elements instead.
132-
1. In the **Console**, execute `post = document.querySelector('#maincontent ul li')`. It returns element representing the first post.
133-
1. In the **Console**, execute `post.querySelector('h3').textContent` to extract the title.
134-
1. In the **Console**, execute `post.querySelector('span div').textContent` to extract the lead paragraph.
135-
1. In the **Console**, execute `post.querySelector('img').src` to extract the photo URL.
130+
1. Notice that the markup does not provide clear, reusable class names for this task. The structure uses generic tags and randomized classes, requiring you to rely on the element hierarchy and order instead.
131+
1. In the **Console**, execute `post = document.querySelector('#maincontent ul li')`. This returns the element representing the first post.
132+
1. Extract the post's title by executing `post.querySelector('h3').textContent`.
133+
1. Extract the lead paragraph by executing `post.querySelector('span div').textContent`.
134+
1. Extract the photo URL by executing `post.querySelector('img').src`.
136135

137136
</details>

0 commit comments

Comments
 (0)