Skip to content

Input file handling #960

@Pijukatel

Description

@Pijukatel

The point of this issue is to define CLI behavior with respect to the input files so that it is compatible with the rest of the Apify tooling.

1. CLI respects the env var to get input key

Ordered from highest priority to lowest:
ACTOR_INPUT_KEY
APIFY_INPUT_KEY
CRAWLEE_INPUT_KEY
default="INPUT"

The CLI decides what is the desired input key based on what is saved n the env vars and then it starts the Actor with all of those three env vars set to this value.

Example 1:

If

ACTOR_INPUT_KEY is missing
APIFY_INPUT_KEY= "CUSTOM_INPUT"
CRAWLEE_INPUT_KEY is missing

then CLI will start the actor in with following env vars set:

ACTOR_INPUT_KEY="CUSTOM_INPUT"
APIFY_INPUT_KEY="CUSTOM_INPUT"
CRAWLEE_INPUT_KEY="CUSTOM_INPUT"

Example 2:
If

ACTOR_INPUT_KEY is missing
APIFY_INPUT_KEY is missing
CRAWLEE_INPUT_KEY is missing

then CLI will start the actor in with following env vars set:

ACTOR_INPUT_KEY="INPUT"
APIFY_INPUT_KEY="INPUT"
CRAWLEE_INPUT_KEY="INPUT"

Example 3:
If

ACTOR_INPUT_KEY="A"
APIFY_INPUT_KEY="B"
CRAWLEE_INPUT_KEY="C"

then CLI will start the actor in with following env vars set:

ACTOR_INPUT_KEY="A"
APIFY_INPUT_KEY="A"
CRAWLEE_INPUT_KEY="A"

(This is to avoid edge cases where Crawlee-only-based code might not be even aware of the SDK-based env vars that have higher priority)

2. When called with -i argument and creating temp input file, it should use the name set in the env variable mentioned above:

Example when the env vars are not defined or they have value = "INPUT"

  • -i {"a":"b"} should create temp file in the kvs called INPUT.json
  • -i some_path/custom_input.custom_suffix should create temp copy of the file in the kvs called INPUT.custom_suffix

These temp files are deleted after cli is finished.
If there was a preexisting file of the same name, the cli will temporarily replace it and restore it afterwards.
Restoration does not happen if the file was modified by the Actor.

3. CLI does not care about duplicate input files, SDK should handle it

For example, there is already INPUT file in the default kvs and cli creates INPUT.json there. It is not CLI issue, but SDK should detect it and raise an error.

Some context:
https://apify.slack.com/archives/C02JQSN79V4/p1761920132606859
https://apify.slack.com/archives/C010Q0FBYG3/p1762945429512269

Metadata

Metadata

Assignees

Labels

t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions