Skip to content

Dataset should support binary Buffer payload #3203

@monolithed

Description

@monolithed

Which package is the feature request for? If unsure which one to select, leave blank

@crawlee/core

Feature

It would be great to have a plugin or API that allows accessing the binary data directly, or at least the option to store it in a Dataset without serialization.

I see that the current interface allows restoring data via Buffer.from, but I’m not sure about the efficiency of this approach or the acceptable data size it can handle.

Motivation

I’d like to share my use case.

I have the Crawlee logic extracted into a separate package, and the result of this package’s work is a screenshot.
Right now, I have to upload it to S3 directly inside the requestHandler, which feels like a terrible anti-pattern.

Ideal solution or implementation, and any additional constraints

Ideally, the run method would return a promise with the result from the requestHandler.
I understand this isn’t a simple task and could potentially lead to race conditions.
The most straightforward solution would be to either allow storing binary data in a Dataset, or provide an interface for working with S3.

Alternative solutions or implementations

No response

Other context

No response

Metadata

Metadata

Assignees

Labels

featureIssues that represent new features or improvements to existing features.t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions