Skip to content

Commit c7c90c1

Browse files
authored
fix: improve Vale configuration (#1430)
1 parent 324d0a8 commit c7c90c1

File tree

2 files changed

+7
-7
lines changed

2 files changed

+7
-7
lines changed

.vale.ini

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,10 @@ mdx = md
1111

1212
[*.md]
1313
BasedOnStyles = Vale, Apify, write-good, Microsoft
14-
# Ignore URLs, HTML/XML tags starting with capital letter, lines containing = sign, http & https URL ending with ] or ) & email addresses
15-
TokenIgnores = (<\/?[A-Z].+>), ([^\n]+=[^\n]*), (\[[^\]]+\]\([^\)]+\)), ([^\n]+@[^\n]+\.[^\n]), ({[^}]*}), (`[^`]*`), (`\w+`)
14+
# Ignore URLs, HTML/XML tags starting with capital letter, lines containing = sign, http & https URL ending with ] or ), email addresses, inline code
15+
TokenIgnores = (<\/?[A-Z].+>), ([^\n]+=[^\n]*), (\[[^\]]+\]\([^\)]+\)), ([^\n]+@[^\n]+\.[^\n]), ({[^}]*}), `[^`]+`
16+
# Ignore HTML comments and code blocks
17+
BlockIgnores = (?s) (<!--.*?-->)|(```.*?```)
1618
Vale.Spelling = YES
1719

1820

sources/academy/webscraping/scraping_basics_python/12_framework.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,6 @@ Successfully installed Jinja2-0.0.0 ... ... ... crawlee-0.0.0 ... ... ...
4646

4747
Now let's use the framework to create a new version of our scraper. In the same project directory where our `main.py` file lives, create a file `newmain.py`. This way, we can keep peeking at the original implementation while working on the new one. The initial content will look like this:
4848

49-
<!-- vale off -->
5049
```py title="newmain.py"
5150
import asyncio
5251
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler
@@ -63,7 +62,6 @@ async def main():
6362
if __name__ == '__main__':
6463
asyncio.run(main())
6564
```
66-
<!-- vale on -->
6765

6866
In the code, we do the following:
6967

@@ -427,15 +425,15 @@ If you export the dataset as JSON, it should look something like this:
427425
{
428426
"url": "https://www.f1academy.com/Racing-Series/Drivers/29/Emely-De-Heus",
429427
"name": "Emely De Heus",
430-
"team": "MP Motorsport"
428+
"team": "MP Motorsport",
431429
"nationality": "Dutch",
432430
"dob": "2003-02-10",
433431
"instagram_url": "https://www.instagram.com/emely.de.heus/",
434432
},
435433
{
436434
"url": "https://www.f1academy.com/Racing-Series/Drivers/28/Hamda-Al-Qubaisi",
437435
"name": "Hamda Al Qubaisi",
438-
"team": "MP Motorsport"
436+
"team": "MP Motorsport",
439437
"nationality": "Emirati",
440438
"dob": "2002-08-08",
441439
"instagram_url": "https://www.instagram.com/hamdaalqubaisi_official/",
@@ -501,7 +499,7 @@ Hints:
501499

502500
The [Global Top 10](https://www.netflix.com/tudum/top10) page has a table listing the most popular Netflix films worldwide. Scrape the movie names from this page, then search for each movie on [IMDb](https://www.imdb.com/). Assume the first search result is correct and retrieve the film's rating. Each item you push to Crawlee's default dataset should include the following data:
503501

504-
- URL of the film's imdb.com page
502+
- URL of the film's IMDb page
505503
- Title
506504
- Rating
507505

0 commit comments

Comments
 (0)