fix: improve Vale configuration (#1430)

honzajavorek · web-flow · commit c7c90c1cc6eb · 2025-01-24T16:11:25.000+01:00
diff --git a/.vale.ini b/.vale.ini
@@ -11,8 +11,10 @@ mdx = md
 
 [*.md]
 BasedOnStyles = Vale, Apify, write-good, Microsoft
-# Ignore URLs, HTML/XML tags starting with capital letter, lines containing = sign, http & https URL ending with ] or ) & email addresses
-TokenIgnores = (<\/?[A-Z].+>), ([^\n]+=[^\n]*), (\[[^\]]+\]\([^\)]+\)), ([^\n]+@[^\n]+\.[^\n]), ({[^}]*}), (`[^`]*`), (`\w+`)
+# Ignore URLs, HTML/XML tags starting with capital letter, lines containing = sign, http & https URL ending with ] or ), email addresses, inline code
+TokenIgnores = (<\/?[A-Z].+>), ([^\n]+=[^\n]*), (\[[^\]]+\]\([^\)]+\)), ([^\n]+@[^\n]+\.[^\n]), ({[^}]*}), `[^`]+`
+# Ignore HTML comments and code blocks
+BlockIgnores = (?s) (<!--.*?-->)|(```.*?```)
 Vale.Spelling = YES
 
 
diff --git a/sources/academy/webscraping/scraping_basics_python/12_framework.md b/sources/academy/webscraping/scraping_basics_python/12_framework.md
@@ -46,7 +46,6 @@ Successfully installed Jinja2-0.0.0 ... ... ... crawlee-0.0.0 ... ... ...
 
 Now let's use the framework to create a new version of our scraper. In the same project directory where our `main.py` file lives, create a file `newmain.py`. This way, we can keep peeking at the original implementation while working on the new one. The initial content will look like this:
 
-<!-- vale off -->
 ```py title="newmain.py"
 import asyncio
 from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler
@@ -63,7 +62,6 @@ async def main():
 if __name__ == '__main__':
     asyncio.run(main())
 ```
-<!-- vale on -->
 
 In the code, we do the following:
 
@@ -427,15 +425,15 @@ If you export the dataset as JSON, it should look something like this:
   {
     "url": "https://www.f1academy.com/Racing-Series/Drivers/29/Emely-De-Heus",
     "name": "Emely De Heus",
-    "team": "MP Motorsport"
+    "team": "MP Motorsport",
     "nationality": "Dutch",
     "dob": "2003-02-10",
     "instagram_url": "https://www.instagram.com/emely.de.heus/",
   },
   {
     "url": "https://www.f1academy.com/Racing-Series/Drivers/28/Hamda-Al-Qubaisi",
     "name": "Hamda Al Qubaisi",
-    "team": "MP Motorsport"
+    "team": "MP Motorsport",
     "nationality": "Emirati",
     "dob": "2002-08-08",
     "instagram_url": "https://www.instagram.com/hamdaalqubaisi_official/",
@@ -501,7 +499,7 @@ Hints:
 
 The [Global Top 10](https://www.netflix.com/tudum/top10) page has a table listing the most popular Netflix films worldwide. Scrape the movie names from this page, then search for each movie on [IMDb](https://www.imdb.com/). Assume the first search result is correct and retrieve the film's rating. Each item you push to Crawlee's default dataset should include the following data:
 
-- URL of the film's imdb.com page
+- URL of the film's IMDb page
 - Title
 - Rating