How to Combine Playwright and HTTP Requests in Crawlee? #2891

kwdiwt · 2025-03-20T06:40:01Z

kwdiwt
Mar 20, 2025

I'm working on a web scraping project that requires both browser automation and direct HTTP requests. Could the maintainers suggest best practices for mixing Playwright-based page handling with regular HTTP requests in Crawlee?

Answered by janbuchar

Mar 20, 2025

Hello @kwdiwt and thank you for your interest in Crawlee! If you don't need to use a BrowserPool, I suggest that you just perform the login using playwright without any Crawlee wrappers, retrieve the cookies and use them to construct your HttpCrawler (or any of its subclasses - CheerioCrawler etc.):

crawler = HttpCrawler({
  // ...
  sessionPoolOptions: {
    sessionOptions: {
      cookieJar: {
        "yourCookie": "value"
      } // this can be a toughcookie.CookieJar instance as well
    }
  }
})

await crawler.run()

If you need to perform the login for each new session (perhaps to avoid getting blocked), you can use the createSessionFunction option (https://crawlee.dev/api/core/inter…

View full answer

kwdiwt · 2025-03-20T06:43:28Z

kwdiwt
Mar 20, 2025
Author

How to Switch from Playwright Login to HTTP Crawler After Cookie Acquisition？

0 replies

janbuchar · 2025-03-20T10:56:33Z

janbuchar
Mar 20, 2025
Maintainer

Hello @kwdiwt and thank you for your interest in Crawlee! If you don't need to use a BrowserPool, I suggest that you just perform the login using playwright without any Crawlee wrappers, retrieve the cookies and use them to construct your HttpCrawler (or any of its subclasses - CheerioCrawler etc.):

crawler = HttpCrawler({
  // ...
  sessionPoolOptions: {
    sessionOptions: {
      cookieJar: {
        "yourCookie": "value"
      } // this can be a toughcookie.CookieJar instance as well
    }
  }
})

await crawler.run()

If you need to perform the login for each new session (perhaps to avoid getting blocked), you can use the createSessionFunction option (https://crawlee.dev/api/core/interface/SessionPoolOptions#createSessionFunction) instead of sessionOptions.

3 replies

kwdiwt Mar 21, 2025
Author

Can PlaywrightCrawler and CheerioCrawler Coexist with Shared Context(cookies/session storage/header...) ? Has expamle?

janbuchar Apr 8, 2025
Maintainer

Sorry for the late response! Currently, the session pool cannot be easily shared between multiple crawlers. You could customize the createSessionFunction for both crawlers so that it delegates to a shared session pool, I believe. Sadly I have no example to demonstrate that.

kwdiwt Apr 8, 2025
Author

OK,THX

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Combine Playwright and HTTP Requests in Crawlee? #2891

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to Combine Playwright and HTTP Requests in Crawlee? #2891

Uh oh!

kwdiwt Mar 20, 2025

Replies: 2 comments · 3 replies

Uh oh!

kwdiwt Mar 20, 2025 Author

Uh oh!

janbuchar Mar 20, 2025 Maintainer

Uh oh!

kwdiwt Mar 21, 2025 Author

Uh oh!

janbuchar Apr 8, 2025 Maintainer

Uh oh!

kwdiwt Apr 8, 2025 Author

kwdiwt
Mar 20, 2025

Replies: 2 comments 3 replies

kwdiwt
Mar 20, 2025
Author

janbuchar
Mar 20, 2025
Maintainer

kwdiwt Mar 21, 2025
Author

janbuchar Apr 8, 2025
Maintainer

kwdiwt Apr 8, 2025
Author