Skip to content

Commit 4ec6687

Browse files
committed
style: make Vale happy about Gzip
1 parent 10da313 commit 4ec6687

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

sources/academy/webscraping/advanced_web_scraping/crawling/sitemaps-vs-search.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Sitemap is usually a simple XML file that contains a list of all pages on the we
3131
- _Does not directly reflect the website_ - There is no way you can ensure that all pages on the website are in the sitemap. The sitemap also can contain pages that were already removed and will return 404s. This is a major downside of sitemaps which prevents us from using them as the only source of URLs.
3232
- _Updated in intervals_ - Sitemaps are usually not updated in real-time. This means that you might miss some pages if you scrape them too soon after they were added to the website. Common update intervals are 1 day or 1 week.
3333
- _Hard to find or unavailable_ - Sitemaps are not always trivial to locate. They can be deployed on a CDN with unpredictable URLs. Sometimes they are not available at all.
34-
- _Streamed, compressed, and archived_ - Sitemaps are often streamed and archived with .tgz extensions and compressed with gzip. This means that you cannot use default HTTP client settings and must handle these cases with extra code or use a scraping framework.
34+
- _Streamed, compressed, and archived_ - Sitemaps are often streamed and archived with .tgz extensions and compressed with Gzip. This means that you cannot use default HTTP client settings and must handle these cases with extra code or use a scraping framework.
3535

3636
## Pros and cons of categories, search, and filters
3737

0 commit comments

Comments
 (0)