Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
5e5f937
feat(website): header and footer design uplift (#2862)
webrdaniel Mar 5, 2025
67a75d2
Navbar dropdown small fixes
webrdaniel Mar 5, 2025
2c8d966
feat(website): search modal design uplift (#2867)
webrdaniel Mar 5, 2025
89e31d2
feat(website): redesign homepages (#2871)
baldasseva Mar 6, 2025
5e4eedc
add decorations to cta section
baldasseva Mar 6, 2025
97357bc
add decorative circle to cta section
baldasseva Mar 6, 2025
c676a32
LanguageInfoWidget improvements
webrdaniel Mar 7, 2025
c70965c
languageGetStartedContainer fix alignment
webrdaniel Mar 7, 2025
c79c6ab
codeblock fixes
webrdaniel Mar 7, 2025
70db3b0
cli example improvements
webrdaniel Mar 7, 2025
a99f877
fix ctaImage
webrdaniel Mar 7, 2025
7c32c7a
cta logo alignment
webrdaniel Mar 7, 2025
11331ec
small fixes
webrdaniel Mar 7, 2025
3893ff3
Merge branch 'master' of github.com:apify/crawlee into feat/design-up…
webrdaniel Mar 7, 2025
f1871cf
yarn lock fix
webrdaniel Mar 7, 2025
bcac624
yarn.lock fix
webrdaniel Mar 7, 2025
d8c8c8a
code blocks smaller padding
webrdaniel Mar 7, 2025
0c28c83
codeblock fix
webrdaniel Mar 7, 2025
0787a67
buttons fix
webrdaniel Mar 7, 2025
fbb52e1
fix blinking logo
webrdaniel Mar 7, 2025
f2078b3
navigation hiding versions fix
webrdaniel Mar 7, 2025
083e8aa
python codeblock fixes
webrdaniel Mar 7, 2025
722b423
navbar button hover
webrdaniel Mar 7, 2025
d90ed0e
logo loading fix
webrdaniel Mar 7, 2025
05b08c1
navbar logo not clickable
webrdaniel Mar 7, 2025
6ae3c43
code examples update
webrdaniel Mar 10, 2025
162ceae
remove comment
webrdaniel Mar 10, 2025
7a9e56c
updated code example
webrdaniel Mar 11, 2025
c76831b
CR fixes
webrdaniel Mar 12, 2025
b826212
update title
webrdaniel Mar 12, 2025
ad5046e
navigation hover
webrdaniel Mar 12, 2025
eac42f0
fix paddings
baldasseva Mar 13, 2025
4cd595f
fix search bar in header in tablet
baldasseva Mar 13, 2025
73c1bec
animated logo
webrdaniel Mar 13, 2025
7da5e44
Merge branch 'feat/design-uplift-2025' of github.com:apify/crawlee in…
webrdaniel Mar 13, 2025
2d54cac
align code centrally
baldasseva Mar 13, 2025
82f051c
Merge branch 'feat/design-uplift-2025' of https://github.com/apify/cr…
baldasseva Mar 13, 2025
9de6e99
optimized logo
webrdaniel Mar 14, 2025
fb33013
switch cli example in homepage
baldasseva Mar 17, 2025
ec23097
river codeblock fix
webrdaniel Mar 17, 2025
b1a1648
fix
webrdaniel Mar 17, 2025
7795a49
firefox logo fix
webrdaniel Mar 18, 2025
78051df
update theme switcher
webrdaniel Mar 18, 2025
57e6edc
added links to homepage
webrdaniel Mar 18, 2025
9c15daf
deploy to cloud button fix
webrdaniel Mar 18, 2025
c0536f8
link fix
webrdaniel Mar 18, 2025
c1d45d2
cards hover on light mode
webrdaniel Mar 18, 2025
293acb2
dynamic year in footer
baldasseva Mar 18, 2025
a1e0db2
remove unused import
baldasseva Mar 18, 2025
bb381fa
format Navbar/Content
baldasseva Mar 18, 2025
8109fa2
Apify platform casing
webrdaniel Mar 18, 2025
3e1784f
Merge branch 'feat/design-uplift-2025' of github.com:apify/crawlee in…
webrdaniel Mar 18, 2025
6f7ed9c
content fixes
webrdaniel Mar 18, 2025
82ef086
refactor: move `/docs` and `/api` routes behind `/js` prefix
B4nan Mar 19, 2025
d79eae1
refactor: use links with `/js` prefix internally
B4nan Mar 20, 2025
b0a7d03
chore: fix broken links in readmes and changelogs
barjin Mar 20, 2025
5cf4f40
chore: fix broken link in cards section
barjin Mar 20, 2025
2e9d5cf
small fixes
webrdaniel Mar 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
8 changes: 4 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -1256,7 +1256,7 @@ const crawler = new BasicCrawler({

#### How to use `sendRequest()`?

See [the Got Scraping guide](https://crawlee.dev/docs/guides/got-scraping).
See [the Got Scraping guide](https://crawlee.dev/js/docs/guides/got-scraping).

#### Removed options

Expand Down Expand Up @@ -1381,7 +1381,7 @@ Previously, you were able to have a browser pool that would mix Puppeteer and Pl

One small feature worth mentioning is the ability to handle requests with browser crawlers outside the browser. To do that, we can use a combination of `Request.skipNavigation` and `context.sendRequest()`.

Take a look at how to achieve this by checking out the [Skipping navigation for certain requests](https://crawlee.dev/docs/examples/skip-navigation) example!
Take a look at how to achieve this by checking out the [Skipping navigation for certain requests](https://crawlee.dev/js/docs/examples/skip-navigation) example!

### Logging

Expand Down Expand Up @@ -1441,14 +1441,14 @@ await Actor.main(async () => {

#### Events

Apify SDK (v2) exports `Apify.events`, which is an `EventEmitter` instance. With Crawlee, the events are managed by [`EventManager`](https://crawlee.dev/api/core/class/EventManager) class instead. We can either access it via `Actor.eventManager` getter, or use `Actor.on` and `Actor.off` shortcuts instead.
Apify SDK (v2) exports `Apify.events`, which is an `EventEmitter` instance. With Crawlee, the events are managed by [`EventManager`](https://crawlee.dev/js/api/core/class/EventManager) class instead. We can either access it via `Actor.eventManager` getter, or use `Actor.on` and `Actor.off` shortcuts instead.

```diff
-Apify.events.on(...);
+Actor.on(...);
```

> We can also get the [`EventManager`](https://crawlee.dev/api/core/class/EventManager) instance via `Configuration.getEventManager()`.
> We can also get the [`EventManager`](https://crawlee.dev/js/api/core/class/EventManager) instance via `Configuration.getEventManager()`.

In addition to the existing events, we now have an `exit` event fired when calling `Actor.exit()` (which is called at the end of `Actor.main()`). This event allows you to gracefully shut down any resources when `Actor.exit` is called.

Expand Down
2 changes: 1 addition & 1 deletion MIGRATIONS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Migration from 2.x.x to 3.0.0
Check the v3 [upgrading guide](https://crawlee.dev/docs/upgrading/upgrading-to-v3).
Check the v3 [upgrading guide](https://crawlee.dev/js/docs/upgrading/upgrading-to-v3).

# Migration from 1.x.x to 2.0.0
There should be no changes needed apart from upgrading your Node.js version to >= 15.10. If you encounter issues with `cheerio`, [read their CHANGELOG](https://github.com/cheeriojs/cheerio/releases). We bumped it from `rc.3` to `rc.10`.
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Crawlee is available as the [`crawlee`](https://www.npmjs.com/package/crawlee) N

## Installation

We recommend visiting the [Introduction tutorial](https://crawlee.dev/docs/introduction) in Crawlee documentation for more information.
We recommend visiting the [Introduction tutorial](https://crawlee.dev/js/docs/introduction) in Crawlee documentation for more information.

> Crawlee requires **Node.js 16 or higher**.

Expand Down Expand Up @@ -78,7 +78,7 @@ const crawler = new PlaywrightCrawler({
await crawler.run(['https://crawlee.dev']);
```

By default, Crawlee stores data to `./storage` in the current working directory. You can override this directory via Crawlee configuration. For details, see [Configuration guide](https://crawlee.dev/docs/guides/configuration), [Request storage](https://crawlee.dev/docs/guides/request-storage) and [Result storage](https://crawlee.dev/docs/guides/result-storage).
By default, Crawlee stores data to `./storage` in the current working directory. You can override this directory via Crawlee configuration. For details, see [Configuration guide](https://crawlee.dev/js/docs/guides/configuration), [Request storage](https://crawlee.dev/js/docs/guides/request-storage) and [Result storage](https://crawlee.dev/js/docs/guides/result-storage).

### Installing pre-release versions

Expand Down
2 changes: 1 addition & 1 deletion docs/examples/cheerio_crawler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ const crawler = new CheerioCrawler({

// This function will be called for each URL to crawl.
// It accepts a single parameter, which is an object with options as:
// https://crawlee.dev/api/cheerio-crawler/interface/CheerioCrawlerOptions#requestHandler
// https://crawlee.dev/js/api/cheerio-crawler/interface/CheerioCrawlerOptions#requestHandler
// We use for demonstration only 2 of them:
// - request: an instance of the Request class with information such as the URL that is being crawled and HTTP method
// - $: the cheerio object containing parsed HTML
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/http_crawler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ const crawler = new HttpCrawler({

// This function will be called for each URL to crawl.
// It accepts a single parameter, which is an object with options as:
// https://crawlee.dev/api/http-crawler/interface/HttpCrawlerOptions#requestHandler
// https://crawlee.dev/js/api/http-crawler/interface/HttpCrawlerOptions#requestHandler
// We use for demonstration only 2 of them:
// - request: an instance of the Request class with information such as the URL that is being crawled and HTTP method
// - body: the HTML code of the current page
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/jsdom_crawler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ const crawler = new JSDOMCrawler({

// This function will be called for each URL to crawl.
// It accepts a single parameter, which is an object with options as:
// https://crawlee.dev/api/jsdom-crawler/interface/JSDOMCrawlerOptions#requestHandler
// https://crawlee.dev/js/api/jsdom-crawler/interface/JSDOMCrawlerOptions#requestHandler
// We use for demonstration only 2 of them:
// - request: an instance of the Request class with information such as the URL that is being crawled and HTTP method
// - window: the JSDOM window object
Expand Down
4 changes: 2 additions & 2 deletions docs/experiments/request_locking.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,8 @@ import { RequestQueueV2 } from 'crawlee';
const queue = await RequestQueueV2.open('my-locking-queue');
await queue.addRequests([
{ url: 'https://crawlee.dev' },
{ url: 'https://crawlee.dev/docs' },
{ url: 'https://crawlee.dev/api' },
{ url: 'https://crawlee.dev/js/docs' },
{ url: 'https://crawlee.dev/js/api' },
]);
```

Expand Down
2 changes: 1 addition & 1 deletion docs/guides/docker_browser_js.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Specify the base Docker image. You can read more about
# the available images at https://crawlee.dev/docs/guides/docker-images
# the available images at https://crawlee.dev/js/docs/guides/docker-images
# You can also use any other image from Docker Hub.
FROM apify/actor-node-playwright-chrome:20

Expand Down
2 changes: 1 addition & 1 deletion docs/guides/docker_browser_ts.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Specify the base Docker image. You can read more about
# the available images at https://crawlee.dev/docs/guides/docker-images
# the available images at https://crawlee.dev/js/docs/guides/docker-images
# You can also use any other image from Docker Hub.
FROM apify/actor-node-playwright-chrome:20 AS builder

Expand Down
2 changes: 1 addition & 1 deletion docs/guides/docker_node_js.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Specify the base Docker image. You can read more about
# the available images at https://crawlee.dev/docs/guides/docker-images
# the available images at https://crawlee.dev/js/docs/guides/docker-images
# You can also use any other image from Docker Hub.
FROM apify/actor-node:20

Expand Down
2 changes: 1 addition & 1 deletion docs/guides/docker_node_ts.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Specify the base Docker image. You can read more about
# the available images at https://crawlee.dev/docs/guides/docker-images
# the available images at https://crawlee.dev/js/docs/guides/docker-images
# You can also use any other image from Docker Hub.
FROM apify/actor-node:20 AS builder

Expand Down
2 changes: 1 addition & 1 deletion docs/guides/request_storage.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ The following code demonstrates the usage of the request queue:
</TabItem>
</Tabs>

To see more detailed example of how to use the request queue with a crawler, see the [Puppeteer Crawler](/docs/examples/puppeteer-crawler) example.
To see more detailed example of how to use the request queue with a crawler, see the [Puppeteer Crawler](/js/docs/examples/puppeteer-crawler) example.

## Request list

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import ApiLink from '@site/src/components/ApiLink';

import WebServerSource from '!!raw-loader!./web-server.mjs';

Most of the time, Crawlee jobs are run as batch jobs. You have a list of URLs you want to scrape every week or you might want to scrape a whole website once per day. After the scrape, you send the data to your warehouse for analytics. Batch jobs are efficient because they can use [Crawlee's built-in autoscaling](https://crawlee.dev/docs/guides/scaling-crawlers) to fully utilize the resources you have available. But sometimes you have a use-case where you need to return scrape data as soon as possible. There might be a user waiting on the other end so every millisecond counts. This is where running Crawlee in a web server comes in.
Most of the time, Crawlee jobs are run as batch jobs. You have a list of URLs you want to scrape every week or you might want to scrape a whole website once per day. After the scrape, you send the data to your warehouse for analytics. Batch jobs are efficient because they can use [Crawlee's built-in autoscaling](https://crawlee.dev/js/docs/guides/scaling-crawlers) to fully utilize the resources you have available. But sometimes you have a use-case where you need to return scrape data as soon as possible. There might be a user waiting on the other end so every millisecond counts. This is where running Crawlee in a web server comes in.

We will build a simple HTTP server that receives a page URL and returns the page title in the response. We will base this guide on the approach used in [Apify's Super Scraper API repository](https://github.com/apify/super-scraper) which maps incoming HTTP requests to Crawlee <ApiLink to="core/class/Request">Request</ApiLink>.

Expand Down
8 changes: 4 additions & 4 deletions docs/introduction/01-setting-up.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,10 @@ You will see log messages in the terminal as Crawlee boots up and starts scrapin
```log
INFO PlaywrightCrawler: Starting the crawl
INFO PlaywrightCrawler: Title of https://crawlee.dev/ is 'Crawlee · Build reliable crawlers. Fast. | Crawlee'
INFO PlaywrightCrawler: Title of https://crawlee.dev/docs/examples is 'Examples | Crawlee'
INFO PlaywrightCrawler: Title of https://crawlee.dev/api/core is '@crawlee/core | API | Crawlee'
INFO PlaywrightCrawler: Title of https://crawlee.dev/api/core/changelog is 'Changelog | API | Crawlee'
INFO PlaywrightCrawler: Title of https://crawlee.dev/docs/quick-start is 'Quick Start | Crawlee'
INFO PlaywrightCrawler: Title of https://crawlee.dev/js/docs/examples is 'Examples | Crawlee'
INFO PlaywrightCrawler: Title of https://crawlee.dev/js/api/core is '@crawlee/core | API | Crawlee'
INFO PlaywrightCrawler: Title of https://crawlee.dev/js/api/core/changelog is 'Changelog | API | Crawlee'
INFO PlaywrightCrawler: Title of https://crawlee.dev/js/docs/quick-start is 'Quick Start | Crawlee'
```

You can always terminate the crawl with a keypress in the terminal:
Expand Down
2 changes: 1 addition & 1 deletion docs/introduction/03-adding-urls.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ This means that no new requests will be started after the 20th request is finish
There are numerous approaches to finding links to follow when crawling the web. For our purposes, we will be looking for `<a>` elements that contain the `href` attribute because that's what you need in most cases. For example:

```html
<a href="https://crawlee.dev/docs/introduction">This is a link to Crawlee introduction</a>
<a href="https://crawlee.dev/js/docs/introduction">This is a link to Crawlee introduction</a>
```

Since this is the most common case, it is also the `enqueueLinks` default.
Expand Down
6 changes: 3 additions & 3 deletions docs/quick-start/quick_start_cheerio.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
INFO CheerioCrawler: Starting the crawl
INFO CheerioCrawler: Title of https://crawlee.dev/ is 'Crawlee · Build reliable crawlers. Fast. | Crawlee'
INFO CheerioCrawler: Title of https://crawlee.dev/docs/examples is 'Examples | Crawlee'
INFO CheerioCrawler: Title of https://crawlee.dev/docs/quick-start is 'Quick Start | Crawlee'
INFO CheerioCrawler: Title of https://crawlee.dev/docs/guides is 'Guides | Crawlee'
INFO CheerioCrawler: Title of https://crawlee.dev/js/docs/examples is 'Examples | Crawlee'
INFO CheerioCrawler: Title of https://crawlee.dev/js/docs/quick-start is 'Quick Start | Crawlee'
INFO CheerioCrawler: Title of https://crawlee.dev/js/docs/guides is 'Guides | Crawlee'
24 changes: 12 additions & 12 deletions docs/upgrading/upgrading_v3.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,18 +20,18 @@ Up until version 3 of `apify`, the package contained both scraping related tools

The [`crawlee`](https://www.npmjs.com/package/crawlee) package consists of several smaller packages, released separately under `@crawlee` namespace:

- [`@crawlee/core`](https://crawlee.dev/api/core): the base for all the crawler implementations, also contains things like `Request`, `RequestQueue`, `RequestList` or `Dataset` classes
- [`@crawlee/cheerio`](https://crawlee.dev/api/cheerio-crawler): exports `CheerioCrawler`
- [`@crawlee/playwright`](https://crawlee.dev/api/playwright-crawler): exports `PlaywrightCrawler`
- [`@crawlee/puppeteer`](https://crawlee.dev/api/puppeteer-crawler): exports `PuppeteerCrawler`
- [`@crawlee/jsdom`](https://crawlee.dev/api/jsdom-crawler): exports `JSDOMCrawler`
- [`@crawlee/basic`](https://crawlee.dev/api/basic-crawler): exports `BasicCrawler`
- [`@crawlee/http`](https://crawlee.dev/api/http-crawler): exports `HttpCrawler` (which is used for creating [`@crawlee/jsdom`](https://crawlee.dev/api/jsdom-crawler) and [`@crawlee/cheerio`](https://crawlee.dev/api/cheerio-crawler))
- [`@crawlee/browser`](https://crawlee.dev/api/browser-crawler): exports `BrowserCrawler` (which is used for creating [`@crawlee/playwright`](https://crawlee.dev/api/playwright-crawler) and [`@crawlee/puppeteer`](https://crawlee.dev/api/puppeteer-crawler))
- [`@crawlee/memory-storage`](https://crawlee.dev/api/memory-storage): [`@apify/storage-local`](https://npmjs.com/package/@apify/storage-local) alternative
- [`@crawlee/browser-pool`](https://crawlee.dev/api/browser-pool): previously [`browser-pool`](https://npmjs.com/package/browser-pool) package
- [`@crawlee/utils`](https://crawlee.dev/api/utils): utility methods
- [`@crawlee/types`](https://crawlee.dev/api/types): holds TS interfaces mainly about the [`StorageClient`](https://crawlee.dev/api/core/interface/StorageClient)
- [`@crawlee/core`](https://crawlee.dev/js/api/core): the base for all the crawler implementations, also contains things like `Request`, `RequestQueue`, `RequestList` or `Dataset` classes
- [`@crawlee/cheerio`](https://crawlee.dev/js/api/cheerio-crawler): exports `CheerioCrawler`
- [`@crawlee/playwright`](https://crawlee.dev/js/api/playwright-crawler): exports `PlaywrightCrawler`
- [`@crawlee/puppeteer`](https://crawlee.dev/js/api/puppeteer-crawler): exports `PuppeteerCrawler`
- [`@crawlee/jsdom`](https://crawlee.dev/js/api/jsdom-crawler): exports `JSDOMCrawler`
- [`@crawlee/basic`](https://crawlee.dev/js/api/basic-crawler): exports `BasicCrawler`
- [`@crawlee/http`](https://crawlee.dev/js/api/http-crawler): exports `HttpCrawler` (which is used for creating [`@crawlee/jsdom`](https://crawlee.dev/js/api/jsdom-crawler) and [`@crawlee/cheerio`](https://crawlee.dev/js/api/cheerio-crawler))
- [`@crawlee/browser`](https://crawlee.dev/js/api/browser-crawler): exports `BrowserCrawler` (which is used for creating [`@crawlee/playwright`](https://crawlee.dev/js/api/playwright-crawler) and [`@crawlee/puppeteer`](https://crawlee.dev/js/api/puppeteer-crawler))
- [`@crawlee/memory-storage`](https://crawlee.dev/js/api/memory-storage): [`@apify/storage-local`](https://npmjs.com/package/@apify/storage-local) alternative
- [`@crawlee/browser-pool`](https://crawlee.dev/js/api/browser-pool): previously [`browser-pool`](https://npmjs.com/package/browser-pool) package
- [`@crawlee/utils`](https://crawlee.dev/js/api/utils): utility methods
- [`@crawlee/types`](https://crawlee.dev/js/api/types): holds TS interfaces mainly about the [`StorageClient`](https://crawlee.dev/js/api/core/interface/StorageClient)

### Installing Crawlee

Expand Down
Loading