Skip to content

Commit 0426110

Browse files
committed
update nav
1 parent 2fa60fe commit 0426110

24 files changed

+264
-35
lines changed

docs/Crawl4AI/01_asynccrawlerstrategy.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
---
2+
layout: default
3+
title: "AsyncCrawlerStrategy"
4+
parent: "Crawl4AI"
5+
nav_order: 1
6+
---
7+
18
# Chapter 1: How We Fetch Webpages - AsyncCrawlerStrategy
29

310
Welcome to the Crawl4AI tutorial series! Our goal is to build intelligent agents that can understand and extract information from the web. The very first step in this process is actually *getting* the content from a webpage. This chapter explains how Crawl4AI handles that fundamental task.

docs/Crawl4AI/02_asyncwebcrawler.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
---
2+
layout: default
3+
title: "AsyncWebCrawler"
4+
parent: "Crawl4AI"
5+
nav_order: 2
6+
---
7+
18
# Chapter 2: Meet the General Manager - AsyncWebCrawler
29

310
In [Chapter 1: How We Fetch Webpages - AsyncCrawlerStrategy](01_asynccrawlerstrategy.md), we learned about the different ways Crawl4AI can fetch the raw content of a webpage, like choosing between a fast drone (`AsyncHTTPCrawlerStrategy`) or a versatile delivery truck (`AsyncPlaywrightCrawlerStrategy`).

docs/Crawl4AI/03_crawlerrunconfig.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
---
2+
layout: default
3+
title: "CrawlerRunConfig"
4+
parent: "Crawl4AI"
5+
nav_order: 3
6+
---
7+
18
# Chapter 3: Giving Instructions - CrawlerRunConfig
29

310
In [Chapter 2: Meet the General Manager - AsyncWebCrawler](02_asyncwebcrawler.md), we met the `AsyncWebCrawler`, the central coordinator for our web crawling tasks. We saw how to tell it *what* URL to crawl using the `arun` method.

docs/Crawl4AI/04_contentscrapingstrategy.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
---
2+
layout: default
3+
title: "ContentScrapingStrategy"
4+
parent: "Crawl4AI"
5+
nav_order: 4
6+
---
7+
18
# Chapter 4: Cleaning Up the Mess - ContentScrapingStrategy
29

310
In [Chapter 3: Giving Instructions - CrawlerRunConfig](03_crawlerrunconfig.md), we learned how to give specific instructions to our `AsyncWebCrawler` using `CrawlerRunConfig`. This included telling it *how* to fetch the page and potentially take screenshots or PDFs.

docs/Crawl4AI/05_relevantcontentfilter.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
---
2+
layout: default
3+
title: "RelevantContentFilter"
4+
parent: "Crawl4AI"
5+
nav_order: 5
6+
---
7+
18
# Chapter 5: Focusing on What Matters - RelevantContentFilter
29

310
In [Chapter 4: Cleaning Up the Mess - ContentScrapingStrategy](04_contentscrapingstrategy.md), we learned how Crawl4AI takes the raw, messy HTML from a webpage and cleans it up using a `ContentScrapingStrategy`. This gives us a tidier version of the HTML (`cleaned_html`) and extracts basic elements like links and images.

docs/Crawl4AI/06_extractionstrategy.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
---
2+
layout: default
3+
title: "ExtractionStrategy"
4+
parent: "Crawl4AI"
5+
nav_order: 6
6+
---
7+
18
# Chapter 6: Getting Specific Data - ExtractionStrategy
29

310
In the previous chapter, [Chapter 5: Focusing on What Matters - RelevantContentFilter](05_relevantcontentfilter.md), we learned how to sift through the cleaned webpage content to keep only the parts relevant to our query or goal, producing a focused `fit_markdown`. This is great for tasks like summarization or getting the main gist of an article.

docs/Crawl4AI/07_crawlresult.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
---
2+
layout: default
3+
title: "CrawlResult"
4+
parent: "Crawl4AI"
5+
nav_order: 7
6+
---
7+
18
# Chapter 7: Understanding the Results - CrawlResult
29

310
In the previous chapter, [Chapter 6: Getting Specific Data - ExtractionStrategy](06_extractionstrategy.md), we learned how to teach Crawl4AI to act like an analyst, extracting specific, structured data points from a webpage using an `ExtractionStrategy`. We've seen how Crawl4AI can fetch pages, clean them, filter them, and even extract precise information.
@@ -247,7 +254,7 @@ if __name__ == "__main__":
247254

248255
You don't interact with the `CrawlResult` constructor directly. The `AsyncWebCrawler` creates it for you at the very end of the `arun` process, typically inside its internal `aprocess_html` method (or just before returning if fetching from cache).
249256

250-
Heres a simplified sequence:
257+
Here's a simplified sequence:
251258

252259
1. **Fetch:** `AsyncWebCrawler` calls the [AsyncCrawlerStrategy](01_asynccrawlerstrategy.md) to get the raw `html`, `status_code`, `response_headers`, etc.
253260
2. **Scrape:** It passes the `html` to the [ContentScrapingStrategy](04_contentscrapingstrategy.md) to get `cleaned_html`, `links`, `media`, `metadata`.

docs/Crawl4AI/08_deepcrawlstrategy.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
---
2+
layout: default
3+
title: "DeepCrawlStrategy"
4+
parent: "Crawl4AI"
5+
nav_order: 8
6+
---
7+
18
# Chapter 8: Exploring Websites - DeepCrawlStrategy
29

310
In [Chapter 7: Understanding the Results - CrawlResult](07_crawlresult.md), we saw the final report (`CrawlResult`) that Crawl4AI gives us after processing a single URL. This report contains cleaned content, links, metadata, and maybe even extracted data.

docs/Crawl4AI/09_cachecontext___cachemode.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
---
2+
layout: default
3+
title: "CacheContext & CacheMode"
4+
parent: "Crawl4AI"
5+
nav_order: 9
6+
---
7+
18
# Chapter 9: Smart Fetching with Caching - CacheContext / CacheMode
29

310
In the previous chapter, [Chapter 8: Exploring Websites - DeepCrawlStrategy](08_deepcrawlstrategy.md), we saw how Crawl4AI can explore websites by following links, potentially visiting many pages. During such explorations, or even when you run the same crawl multiple times, the crawler might try to fetch the exact same webpage again and again. This can be slow and might unnecessarily put a load on the website you're crawling. Wouldn't it be smarter to remember the result from the first time and just reuse it?

docs/Crawl4AI/10_basedispatcher.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
---
2+
layout: default
3+
title: "BaseDispatcher"
4+
parent: "Crawl4AI"
5+
nav_order: 10
6+
---
7+
18
# Chapter 10: Orchestrating the Crawl - BaseDispatcher
29

310
In [Chapter 9: Smart Fetching with Caching - CacheContext / CacheMode](09_cachecontext___cachemode.md), we learned how Crawl4AI uses caching to cleverly avoid re-fetching the same webpage multiple times, which is especially helpful when crawling many URLs. We've also seen how methods like `arun_many()` ([Chapter 2: Meet the General Manager - AsyncWebCrawler](02_asyncwebcrawler.md)) or strategies like [DeepCrawlStrategy](08_deepcrawlstrategy.md) can lead to potentially hundreds or thousands of individual URLs needing to be crawled.

0 commit comments

Comments
 (0)