Skip to content

Commit e383a5d

Browse files
committed
fix: link MDN on first mention of regular expressions
Also replace one external resource with MDN. Let's be consistent in what we recommend. Even though the tool looks great, I think it's better to link to an explanation of the concept. MDN has a section 'Tools', which lists similar tools, and they'll maintain it for us if new tools appear or old ones disappear. See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#tools
1 parent f2771b0 commit e383a5d

File tree

2 files changed

+4
-6
lines changed

2 files changed

+4
-6
lines changed

sources/academy/tutorials/apify_scrapers/getting_started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -152,7 +152,7 @@ In the structures, only the `OWNER` and `NAME` change. We can leverage this in a
152152

153153
#### Making a pseudo URL
154154

155-
**Pseudo URL**s are really just URLs with some variable parts in them. Those variable parts are represented by [regular expressions](https://regexone.com/) enclosed in brackets `[]`.
155+
**Pseudo URL**s are really just URLs with some variable parts in them. Those variable parts are represented by [regular expressions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions) enclosed in brackets `[]`.
156156

157157
Working with our actor details example, we could produce a **Pseudo URL** like this:
158158

sources/academy/webscraping/web_scraping_for_beginners/data_extraction/using_devtools.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -83,9 +83,7 @@ The `product-item` class is simple, human-readable, and semantically connected w
8383

8484
## Extracting data from elements {#extraction-from-elements}
8585

86-
Now that we found the element, we can start poking into it to extract data. First, let's save the element to a variable so that we can work with it repeatedly.
87-
88-
Run the commands in the Console:
86+
Now that we found the element, we can start poking into it to extract data. First, let's save the element to a variable so that we can work with it repeatedly. Run these commands in the Console:
8987

9088
```js
9189
const products = document.querySelectorAll('.product-item');
@@ -94,7 +92,7 @@ const subwoofer = products[2];
9492

9593
> If you're wondering what an array is or what `products[2]` means, read the [JavaScript arrays basics](https://developer.mozilla.org/en-US/docs/Learn/JavaScript/First_steps/Arrays).
9694
97-
Now that we have the subwoofer saved into a variable, run another command in the Console to print its text:
95+
Now that we have the subwoofer saved in a variable, run another command in the Console to print its text:
9896

9997
```js
10098
subwoofer.textContent;
@@ -150,7 +148,7 @@ It worked, but the price was not alone in the result. We extracted it together w
150148
When it comes to data cleaning, there are two main approaches you can take. It's beneficial to understand both, as one approach may be feasible in a given situation while the other is not.
151149

152150
1. Remove the elements that add noise to your data from the selection. Then extract the pre-cleaned data.
153-
2. Extract the data with noise. Use regular expressions or other text manipulation techniques to parse the data and keep only the parts we're interested in.
151+
2. Extract the data with noise. Use [regular expressions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions) or other text manipulation techniques to parse the data and keep only the parts we're interested in.
154152

155153
First, let's look at **removing the noise before extraction**. When you look closely at the element that contains the price, you'll see that it includes another `<span>` element with the text **Sale price**. This `<span>` is what adds noise to our data, and we have to get rid of it.
156154

0 commit comments

Comments
 (0)