Skip to content

Conversation

@vdusek
Copy link
Collaborator

@vdusek vdusek commented Dec 2, 2024

No description provided.

@vdusek vdusek added documentation Improvements or additions to documentation. t-tooling Issues with this label are in the ownership of the tooling team. adhoc Ad-hoc unplanned task added during the sprint. labels Dec 2, 2024
@vdusek vdusek added this to the 104th sprint - Tooling team milestone Dec 2, 2024
@vdusek vdusek requested a review from janbuchar December 2, 2024 18:31
@vdusek vdusek self-assigned this Dec 2, 2024
Copy link
Contributor

@honzajavorek honzajavorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added two comments which I think improve the spelling or wording.

The rest is my subjective commentary on the matter of Crawlee and Scrapy comparison, which you can take as a feedback, but also completely ignore. I wanted to provide an outsider perspective, but at the same time, I think the framework creators should have the freedom to express their opinionated view and their ambition how the project should differentiate.

Just chiming in, not approving nor disapproving.

Co-authored-by: Honza Javorek <[email protected]>
@vdusek vdusek mentioned this pull request Dec 3, 2024
@vdusek
Copy link
Collaborator Author

vdusek commented Dec 3, 2024

@honzajavorek, thanks for your feedback.

@honzajavorek
Copy link
Contributor

honzajavorek commented Dec 3, 2024

One more thing I didn't notice previously - I'm sorry! Most of the points start with "Crawlee something..." or "unlike Scrapy, Crawlee..." Given the heading already sets the scene, I think we can be shorter by just listing the benefits:

  • Newer project built with modern Python and complete type hint coverage for a better developer experience.
  • Its crawlers are regular Python scripts. You don't need a separate command to launch them, and you can integrate them directly into other applications.
  • Supports state persistence during interruptions, saving time and costs by avoiding the need to restart scraping pipelines from scratch after an issue.
  • Allows saving of multiple types of results in a single scraping run. Offers several storing options (see datasets and key-value stores).

Something along these lines. Feel free to drop my suggestion or get just loosely inspired by it.

@vdusek vdusek merged commit 27db2e4 into master Dec 3, 2024
22 of 23 checks passed
@vdusek vdusek deleted the update-readme branch December 3, 2024 14:07
Mantisus pushed a commit to Mantisus/crawlee-python that referenced this pull request Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. documentation Improvements or additions to documentation. t-tooling Issues with this label are in the ownership of the tooling team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants