You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sources/academy/webscraping/scraping_basics_python/03_devtools_extracting_data.md
+20-1Lines changed: 20 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -97,7 +97,6 @@ On Fandom's [Movies page](https://www.fandom.com/topics/movies), use CSS selecto
97
97
98
98
</details>
99
99
100
-
101
100
### Extract the price of IKEA's most expensive artificial plant
102
101
103
102
At IKEA's [Artificial plants & flowers listing](https://www.ikea.com/se/en/cat/artificial-plants-flowers-20492/), use CSS selectors, and HTML elements manipulation in the **Console** to extract the price of the most expensive artificial plant (sold in Sweden, as we'll be browsing their Swedish offer). Before opening DevTools, use your wits to set the page to a state which is most favorable for you to complete the task with the least effort. In the end, use JavaScript's [`parseInt()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/parseInt) to turn the text to a number.
1. At the time of writing, this returns `699`, as in [699 SEK](https://www.google.com/search?q=699%20sek).
117
116
118
117
</details>
118
+
119
+
### Extract details about the first post on Guardian's F1 news
120
+
121
+
At Guardian's [F1 news page](https://www.theguardian.com/sport/formulaone), use CSS selectors and HTML manipulation in the **Console** to extract details about the first post. Extract its title, lead paragraph, and URL of the photo.
1. Open the [F1 news page](https://www.theguardian.com/sport/formulaone).
129
+
1. Activate the element selection tool in your DevTools.
130
+
1. Click on the first post.
131
+
1. Notice that there are no good classes to go by. The markup uses generic tags and randomized classes. We must rely on the hierarchy and order of the elements instead.
132
+
1. In the **Console**, execute `post = document.querySelector('#maincontent ul li')`. It returns element representing the first post.
133
+
1. In the **Console**, execute `post.querySelector('h3').textContent` to extract the title.
134
+
1. In the **Console**, execute `post.querySelector('span div').textContent` to extract the lead paragraph.
135
+
1. In the **Console**, execute `post.querySelector('img').src` to extract the photo URL.
0 commit comments