Skip to content

Is there a download limit imposed by the video source website? #24

@linhaojia13

Description

@linhaojia13

I run this commad:

video2dataset --url_list="results_2M_train.csv" \
        --input_format="csv" \
        --output-format="webdataset" \
	--output_folder="test" \
        --url_col="contentUrl" \
        --caption_col="name" \
        --save_additional_columns='[videoid,page_idx,page_dir,duration]' \
        --enable_wandb=False \
	--config=default \

At first, the download process went smoothly, and I successfully downloaded 96 .tar files, totaling about 200GB. Then, error messages started appearing.

HTTPSConnectionPool(host='ak.picdn.net', port=443): Read timed out.

I switched to a different computer and attempted to download again, but encountered the same errors after downloading around 200GB.
Could this be due to a download limit imposed by the video source website?
How should I resolve this issue?
@m-bain

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions