You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: link MDN on first mention of regular expressions
Also replace one external resource with MDN. Let's be consistent
in what we recommend. Even though the tool looks great, I think
it's better to link to an explanation of the concept.
MDN has a section 'Tools', which lists similar tools, and they'll
maintain it for us if new tools appear or old ones disappear.
See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#tools
Copy file name to clipboardExpand all lines: sources/academy/tutorials/apify_scrapers/getting_started.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -152,7 +152,7 @@ In the structures, only the `OWNER` and `NAME` change. We can leverage this in a
152
152
153
153
#### Making a pseudo URL
154
154
155
-
**Pseudo URL**s are really just URLs with some variable parts in them. Those variable parts are represented by [regular expressions](https://regexone.com/) enclosed in brackets `[]`.
155
+
**Pseudo URL**s are really just URLs with some variable parts in them. Those variable parts are represented by [regular expressions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions) enclosed in brackets `[]`.
156
156
157
157
Working with our actor details example, we could produce a **Pseudo URL** like this:
Copy file name to clipboardExpand all lines: sources/academy/webscraping/web_scraping_for_beginners/data_extraction/using_devtools.md
+3-5Lines changed: 3 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -83,9 +83,7 @@ The `product-item` class is simple, human-readable, and semantically connected w
83
83
84
84
## Extracting data from elements {#extraction-from-elements}
85
85
86
-
Now that we found the element, we can start poking into it to extract data. First, let's save the element to a variable so that we can work with it repeatedly.
87
-
88
-
Run the commands in the Console:
86
+
Now that we found the element, we can start poking into it to extract data. First, let's save the element to a variable so that we can work with it repeatedly. Run these commands in the Console:
> If you're wondering what an array is or what `products[2]` means, read the [JavaScript arrays basics](https://developer.mozilla.org/en-US/docs/Learn/JavaScript/First_steps/Arrays).
96
94
97
-
Now that we have the subwoofer saved into a variable, run another command in the Console to print its text:
95
+
Now that we have the subwoofer saved in a variable, run another command in the Console to print its text:
98
96
99
97
```js
100
98
subwoofer.textContent;
@@ -150,7 +148,7 @@ It worked, but the price was not alone in the result. We extracted it together w
150
148
When it comes to data cleaning, there are two main approaches you can take. It's beneficial to understand both, as one approach may be feasible in a given situation while the other is not.
151
149
152
150
1. Remove the elements that add noise to your data from the selection. Then extract the pre-cleaned data.
153
-
2. Extract the data with noise. Use regular expressions or other text manipulation techniques to parse the data and keep only the parts we're interested in.
151
+
2. Extract the data with noise. Use [regular expressions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions) or other text manipulation techniques to parse the data and keep only the parts we're interested in.
154
152
155
153
First, let's look at **removing the noise before extraction**. When you look closely at the element that contains the price, you'll see that it includes another `<span>` element with the text **Sale price**. This `<span>` is what adds noise to our data, and we have to get rid of it.
0 commit comments