Skip to content

Commit ad36d5c

Browse files
authored
Merge branch 'master' into renovate/apify-ui-library-0.x
2 parents 45a3acc + 6e133dc commit ad36d5c

31 files changed

+4427
-137
lines changed

apify-api/openapi/paths/datasets/datasets@{datasetId}@items.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -350,6 +350,15 @@ get:
350350
schema:
351351
type: boolean
352352
example: false
353+
- name: view
354+
in: query
355+
description: |
356+
Defines the view configuration for dataset items based on the schema definition.
357+
This parameter determines how the data will be filtered and presented.
358+
For complete specification details, see the [dataset schema documentation](/platform/actors/development/actor-definition/dataset-schema).
359+
schema:
360+
type: string
361+
example: overview
353362
- name: skipFailedPages
354363
in: query
355364
description: |

apify-api/openapi/paths/key-value-stores/key-value-stores@{storeId}@records@{recordKey}.yaml

Lines changed: 46 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ get:
3434
style: simple
3535
schema:
3636
type: string
37-
example: some key
37+
example: someKey
3838
responses:
3939
'200':
4040
description: ''
@@ -65,11 +65,52 @@ get:
6565
- https://docs.apify.com/api/v2#/reference/key-value-stores/get-record
6666
- https://docs.apify.com/api/v2#tag/Key-value-storesRecord/operation/keyValueStore_record_get
6767
x-js-parent: KeyValueStoreClient
68+
x-js-name: getRecord
69+
x-js-doc-url: https://docs.apify.com/api/client/js/reference/class/KeyValueStoreClient#getRecord
70+
x-py-parent: KeyValueStoreClientAsync
71+
x-py-name: get_record
72+
x-py-doc-url: https://docs.apify.com/api/client/python/reference/class/KeyValueStoreClientAsync#get_record
73+
head:
74+
tags:
75+
- Storage/Key-value stores
76+
summary: Check if a record exists
77+
description: |
78+
Check if a value is stored in the key-value store under a specific key.
79+
operationId: keyValueStore_record_head
80+
security:
81+
- apiKeyStoreId: []
82+
- httpBearerStoreId: []
83+
parameters:
84+
- name: storeId
85+
in: path
86+
description: Key-value store ID or `username~store-name`.
87+
required: true
88+
style: simple
89+
schema:
90+
type: string
91+
example: WkzbQMuFYuamGv3YF
92+
- name: recordKey
93+
in: path
94+
description: Key of the record.
95+
required: true
96+
style: simple
97+
schema:
98+
type: string
99+
example: someKey
100+
responses:
101+
'200':
102+
description: 'The record exists'
103+
headers: {}
104+
'404':
105+
description: 'The record does not exist'
106+
headers: {}
107+
deprecated: false
108+
x-js-parent: KeyValueStoreClient
68109
x-js-name: recordExists
69110
x-js-doc-url: https://docs.apify.com/api/client/js/reference/class/KeyValueStoreClient#recordExists
70111
x-py-parent: KeyValueStoreClientAsync
71-
x-py-name: stream_record
72-
x-py-doc-url: https://docs.apify.com/api/client/python/reference/class/KeyValueStoreClientAsync#stream_record
112+
x-py-name: record_exists
113+
x-py-doc-url: https://docs.apify.com/api/client/python/reference/class/KeyValueStoreClientAsync#record_exists
73114
put:
74115
tags:
75116
- Storage/Key-value stores
@@ -108,7 +149,7 @@ put:
108149
style: simple
109150
schema:
110151
type: string
111-
example: some key
152+
example: someKey
112153
- name: Content-Encoding
113154
in: header
114155
description: ''
@@ -177,7 +218,7 @@ delete:
177218
style: simple
178219
schema:
179220
type: string
180-
example: some key
221+
example: someKey
181222
responses:
182223
'204':
183224
description: ''
Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
---
2+
title: Inspecting web pages with browser DevTools
3+
sidebar_label: "DevTools: Inspecting"
4+
description: Lesson about using the browser tools for developers to inspect and manipulate the structure of a website.
5+
slug: /scraping-basics-javascript2/devtools-inspecting
6+
unlisted: true
7+
---
8+
9+
import Exercises from './_exercises.mdx';
10+
11+
**In this lesson we'll use the browser tools for developers to inspect and manipulate the structure of a website.**
12+
13+
---
14+
15+
A browser is the most complete tool for navigating websites. Scrapers are like automated browsers—and sometimes, they actually are automated browsers. The key difference? There's no user to decide where to go or eyes to see what's displayed. Everything has to be pre-programmed.
16+
17+
All modern browsers provide developer tools, or _DevTools_, for website developers to debug their work. We'll use them to understand how websites are structured and identify the behavior our scraper needs to mimic. Here's the typical workflow for creating a scraper:
18+
19+
1. Inspect the target website in DevTools to understand its structure and determine how to extract the required data.
20+
1. Translate those findings into code.
21+
1. If the scraper fails due to overlooked edge cases or, over time, due to website changes, go back to step 1.
22+
23+
Now let's spend some time figuring out what the detective work in step 1 is about.
24+
25+
## Opening DevTools
26+
27+
Google Chrome is currently the most popular browser, and many others use the same core. That's why we'll focus on [Chrome DevTools](https://developer.chrome.com/docs/devtools) here. However, the steps are similar in other browsers, as Safari has its [Web Inspector](https://developer.apple.com/documentation/safari-developer-tools/web-inspector) and Firefox also has [DevTools](https://firefox-source-docs.mozilla.org/devtools-user/).
28+
29+
Now let's peek behind the scenes of a real-world website—say, Wikipedia. We'll open Google Chrome and visit [wikipedia.org](https://www.wikipedia.org/). Then, let's press **F12**, or right-click anywhere on the page and select **Inspect**.
30+
31+
![Wikipedia with Chrome DevTools open](./images/devtools-wikipedia.png)
32+
33+
Websites are built with three main technologies: HTML, CSS, and JavaScript. In the **Elements** tab, DevTools shows the HTML and CSS of the current page:
34+
35+
![Elements tab in Chrome DevTools](./images/devtools-elements-tab.png)
36+
37+
:::warning Screen adaptations
38+
39+
DevTools may appear differently depending on your screen size. For instance, on smaller screens, the CSS panel might move below the HTML elements panel instead of appearing in the right pane.
40+
41+
:::
42+
43+
Think of [HTML](https://developer.mozilla.org/en-US/docs/Learn/HTML) elements as the frame that defines a page's structure. A basic HTML element includes an opening tag, a closing tag, and attributes. Here's an `article` element with an `id` attribute. It wraps `h1` and `p` elements, both containing text. Some text is emphasized using `em`.
44+
45+
```html
46+
<article id="article-123">
47+
<h1 class="heading">First Level Heading</h1>
48+
<p>Paragraph with <em>emphasized text</em>.</p>
49+
</article>
50+
```
51+
52+
HTML, a markup language, describes how everything on a page is organized, how elements relate to each other, and what they mean. It doesn't define how elements should look—that's where [CSS](https://developer.mozilla.org/en-US/docs/Learn/CSS) comes in. CSS is like the velvet covering the frame. Using styles, we can select elements and assign rules that tell the browser how they should appear. For instance, we can style all elements with `heading` in their `class` attribute to make the text blue and uppercase.
53+
54+
```css
55+
.heading {
56+
color: blue;
57+
text-transform: uppercase;
58+
}
59+
```
60+
61+
While HTML and CSS describe what the browser should display, [JavaScript](https://developer.mozilla.org/en-US/docs/Learn/JavaScript) is a general-purpose programming language that adds interaction to the page.
62+
63+
In DevTools, the **Console** tab allows ad-hoc experimenting with JavaScript. If you don't see it, press **ESC** to toggle the Console. Running commands in the Console lets us manipulate the loaded page—we’ll try this shortly.
64+
65+
![Console in Chrome DevTools](./images/devtools-console.png)
66+
67+
## Selecting an element
68+
69+
In the top-left corner of DevTools, let's find the icon with an arrow pointing to a square.
70+
71+
![Chrome DevTools element selection tool](./images/devtools-element-selection.png)
72+
73+
We'll click the icon and hover your cursor over Wikipedia's subtitle, **The Free Encyclopedia**. As we move our cursor, DevTools will display information about the HTML element under it. We'll click on the subtitle. In the **Elements** tab, DevTools will highlight the HTML element that represents the subtitle.
74+
75+
![Chrome DevTools element hover](./images/devtools-hover.png)
76+
77+
The highlighted section should look something like this:
78+
79+
```html
80+
<strong class="jsl10n localized-slogan" data-jsl10n="portal.slogan">
81+
The Free Encyclopedia
82+
</strong>
83+
```
84+
85+
If we were experienced creators of scrapers, our eyes would immediately spot what's needed to make a program that fetches Wikipedia's subtitle. The program would need to download the page's source code, find a `strong` element with `localized-slogan` in its `class` attribute, and extract its text.
86+
87+
:::tip HTML and whitespace
88+
89+
In HTML, whitespace isn't significant, i.e., it only makes the code readable. The following code snippets are equivalent:
90+
91+
```html
92+
<strong>
93+
The Free Encyclopedia
94+
</strong>
95+
```
96+
97+
```html
98+
<strong>The Free
99+
Encyclopedia
100+
</strong>
101+
```
102+
103+
:::
104+
105+
## Interacting with an element
106+
107+
We won't be creating Python scrapers just yet. Let's first get familiar with what we can do in the JavaScript console and how we can further interact with HTML elements on the page.
108+
109+
In the **Elements** tab, with the subtitle element highlighted, let's right-click the element to open the context menu. There, we'll choose **Store as global variable**. The **Console** should appear, with a `temp1` variable ready.
110+
111+
![Global variable in Chrome DevTools Console](./images/devtools-console-variable.png)
112+
113+
The Console allows us to run JavaScript in the context of the loaded page, similar to Python's [interactive REPL](https://realpython.com/interacting-with-python/). We can use it to play around with elements.
114+
115+
For a start, let's access some of the subtitle's properties. One such property is `textContent`, which contains the text inside the HTML element. The last line in the Console is where your cursor is. We'll type the following and hit **Enter**:
116+
117+
```js
118+
temp1.textContent;
119+
```
120+
121+
The result should be `'The Free Encyclopedia'`. Now let's try this:
122+
123+
```js
124+
temp1.outerHTML;
125+
```
126+
127+
This should return the element's HTML tag as a string. Finally, we'll run the next line to change the text of the element:
128+
129+
```js
130+
temp1.textContent = 'Hello World!';
131+
```
132+
133+
When we change elements in the Console, those changes reflect immediately on the page!
134+
135+
![Changing textContent in Chrome DevTools Console](./images/devtools-console-textcontent.png)
136+
137+
But don't worry—we haven't hacked Wikipedia. The change only happens in our browser. If we reload the page, the change will disappear. This, however, is an easy way to craft a screenshot with fake content. That's why screenshots shouldn't be trusted as evidence.
138+
139+
We're not here for playing around with elements, though—we want to create a scraper for an e-commerce website to watch prices. In the next lesson, we'll examine the website and use CSS selectors to locate HTML elements containing the data we need.
140+
141+
---
142+
143+
<Exercises />
144+
145+
### Find FIFA logo
146+
147+
Open the [FIFA website](https://www.fifa.com/) and use the DevTools to figure out the URL of FIFA's logo image file. Hint: You're looking for an [`img`](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img) element with a `src` attribute.
148+
149+
<details>
150+
<summary>Solution</summary>
151+
152+
1. Go to [fifa.com](https://www.fifa.com/).
153+
1. Activate the element selection tool.
154+
1. Click on the logo.
155+
1. Send the highlighted element to the **Console** using the **Store as global variable** option from the context menu.
156+
1. In the console, type `temp1.src` and hit **Enter**.
157+
158+
![DevTools exercise result](./images/devtools-exercise-fifa.png)
159+
160+
</details>
161+
162+
### Make your own news
163+
164+
Open a news website, such as [CNN](https://cnn.com). Use the Console to change the headings of some articles.
165+
166+
<details>
167+
<summary>Solution</summary>
168+
169+
1. Go to [cnn.com](https://cnn.com).
170+
1. Activate the element selection tool.
171+
1. Click on a heading.
172+
1. Send the highlighted element to the **Console** using the **Store as global variable** option from the context menu.
173+
1. In the console, type `temp1.textContent = 'Something something'` and hit **Enter**.
174+
175+
![DevTools exercise result](./images/devtools-exercise-cnn.png)
176+
177+
</details>

0 commit comments

Comments
 (0)