-
|
I'm working on a web scraping project that requires both browser automation and direct HTTP requests. Could the maintainers suggest best practices for mixing Playwright-based page handling with regular HTTP requests in Crawlee? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
|
How to Switch from Playwright Login to HTTP Crawler After Cookie Acquisition? |
Beta Was this translation helpful? Give feedback.
-
|
Hello @kwdiwt and thank you for your interest in Crawlee! If you don't need to use a crawler = HttpCrawler({
// ...
sessionPoolOptions: {
sessionOptions: {
cookieJar: {
"yourCookie": "value"
} // this can be a toughcookie.CookieJar instance as well
}
}
})
await crawler.run()If you need to perform the login for each new session (perhaps to avoid getting blocked), you can use the |
Beta Was this translation helpful? Give feedback.
Hello @kwdiwt and thank you for your interest in Crawlee! If you don't need to use a
BrowserPool, I suggest that you just perform the login usingplaywrightwithout any Crawlee wrappers, retrieve the cookies and use them to construct yourHttpCrawler(or any of its subclasses -CheerioCrawleretc.):If you need to perform the login for each new session (perhaps to avoid getting blocked), you can use the
createSessionFunctionoption (https://crawlee.dev/api/core/inter…