Skip to content

Commit e0e3f26

Browse files
committed
feat: finding a product card
1 parent f34de7f commit e0e3f26

File tree

4 files changed

+44
-6
lines changed

4 files changed

+44
-6
lines changed

sources/academy/webscraping/scraping_basics_python/02_devtools_locating_elements.md

Lines changed: 44 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,17 @@ sidebar_position: 2
66
slug: /scraping-basics-python/devtools-locating-elements
77
---
88

9+
import Exercises from './_exercises.mdx';
10+
911
**In this lesson we'll use the browser tools for developers to manually find products on an e-commerce website.**
1012

1113
---
1214

13-
In this course, we'll build an app to track prices. It'll scrape product pages from an e-commerce site and record the prices. First, let's check out the site we'll be working with.
15+
Inspecting Wikipedia and changing its subtitle is cool, but we should focus on building an app to track prices on an e-commerce site. As part of the groundwork, let's check out the site we'll be working with.
1416

1517
## Meeting the Warehouse store
1618

17-
To keep things practical, we won't use artificial scraping playgrounds or sandboxes. Instead, we'll scrape a real e-commerce site. Shopify, a major e-commerce platform, has a demo store at [warehouse-theme-metal.myshopify.com](https://warehouse-theme-metal.myshopify.com/). It strikes a good balance between being realistic and stable enough for a tutorial.
19+
To keep things practical, we won't use artificial scraping playgrounds or sandboxes. Instead, we'll scrape a real e-commerce site. Shopify, a major e-commerce platform, has a demo store at [warehouse-theme-metal.myshopify.com](https://warehouse-theme-metal.myshopify.com/). It strikes a good balance between being realistic and stable enough for a tutorial. The scraper we're about to build will watch prices of all the products listed on the [Sales page](https://warehouse-theme-metal.myshopify.com/collections/sales).
1820

1921
:::info Balancing authenticity and stability
2022

@@ -24,20 +26,56 @@ However, we deliberately designed all the exercises to work with live websites,
2426

2527
:::
2628

27-
Now let's extract some data about the products listed!
29+
## Finding a product card
30+
31+
As mentioned in the previous lesson, before we build a scraper, we need to have an idea about how the target page is structured and what elements exactly our program should be looking for. So let's figure out how it could select details for each of the products on the [Sales page](https://warehouse-theme-metal.myshopify.com/collections/sales).
32+
33+
![Warehouse store with DevTools open](./images/devtools-warehouse.png)
34+
35+
On the page, there is a grid of product cards with names and pictures of products. Open DevTools and select the name of the **Sony SACS9 Active Subwoofer**. Highlight it in the **Elements** tab by clicking on it.
36+
37+
![Selecting an element with DevTools](./images/devtools-product-name.png)
38+
39+
Now we'll find all elements that contain details about this subwoofer: price, number of reviews, image, and everything else.
40+
41+
In the **Elements** tab, move your cursor up from the `a` element containing the subwoofer's name, hovering over each element on the way, until you find the one that highlights the entire product card. You can achieve the same also by repeatedly pressing the arrow up on your keyboard. This `div` element we just found is a **parent element**, and all the elements nested inside are its **child elements**.
42+
43+
![Selecting an element with hover](./images/devtools-product-hover.png)
44+
45+
At this point we could use **Store as global variable** to send the element to the **Console**, but while this option is useful when manually inspecting the page, that's not something a program can do.
46+
47+
Most often, scrapers use [CSS selectors](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_selectors) to locate elements on the page. And most often, CSS selectors find HTML elements by according to what they contain in their `class` attributes. The product card element we highlighted has the following markup:
48+
49+
```html
50+
<div class="product-item product-item--vertical 1/3--tablet-and-up 1/4--desk">
51+
...
52+
</div>
53+
```
2854

29-
## Navigating the element tree
55+
The `class` attribute can contain several values separated by whitespace. This element thus has four classes. Let's go to the **Console** and try to get a grip on the element using a CSS selector.
3056

31-
## Selecting elements programmatically
57+
## Locating product cards programmatically
58+
59+
:::danger Work in Progress
60+
61+
Under development.
62+
63+
:::
3264

3365
## Choosing good selectors
3466

67+
:::danger Work in Progress
68+
69+
Under development.
70+
71+
:::
72+
3573
---
3674

3775
<Exercises />
3876

3977
:::danger Work in Progress
4078

41-
This lesson is under development. Please read [Extracting data with DevTools](../scraping_basics_javascript/data_extraction/devtools_continued.md) in the meantime so you can follow the upcoming lessons.
79+
Under development.
4280

4381
:::
204 KB
Loading
216 KB
Loading
803 KB
Loading

0 commit comments

Comments
 (0)