Skip to content

Conversation

samueloph
Copy link
Collaborator

There are two ways of doing this now:

  1. wget way: -i, --input-file
  2. curl way: providing an URL argument starting with "@", wcurl will
    see "@filename" and download URLs from "filename".

Lines starting with "#" inside input files are ignored.

This is a continuation of #58

I don't want to rush a change like this, I'm publishing the PR so that it gets more
visibility and I can verify the .md changes. This PR is still missing tests.

Overall we shouldn't try to implement every wget feature, this one might be
important enough to warrant the extra code.

Copy link
Collaborator

@sergiodj sergiodj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @samueloph.

I'm leaving a few inline comments about possible enhancements.

I won't oppose if you think this use case is important enough to be implemented in wcurl, but my two cents is that I can't remember one single time when I needed to provide a list of URLs for wget/curl to download and I decided to use the tool instead of invoking things using for and xargs on shell. But maybe that's just me.

;;

--input-file=*)
add_urls "@$(printf "%s\n" "${1}" | sed 's/^--input-file=//')"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the program now takes an input file, I believe it's necessary to check if it exists and is readable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hopping we could let the shell handle this:

./wcurl -i non_existent
./wcurl: 135: cannot open nonexistent: No such file

./wcurl -i not_allowed
./wcurl: 135: cannot open not_allowed: Permission denied

Is it better do do it ourselves instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on how "easy" we want wcurl to be. If our intention is to make it "user friendly", then I think it's worth checking if the file exists and throw a nice error (possibly with a suggestion to run --help). Otherwise, I'm fine with relying on the shell to do error throwing for us :-).

# If the argument starts with "@", then it's an input file name.
# When parsing an input file, ignore lines starting with "#".
# This function also percent-encodes the whitespaces in URLs.
add_urls()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a version of add_urls that uses printf instead of case, but I think this one is alright :-).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, is it simpler to read and understand?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. This is a back-of-the-napkin implementation to give you an idea:

set -x

U=""

b()
{
	n=$(printf "%s\n" "${1}" | sed 's/ /%20/g')
	U="${U} ${n}"
}

a()
{
	if [ "$(printf '%c' "$1")" = "@" ]; then
		while read -r line; do
			if [ "$(printf '%c' "$line")" != '#' ]; then
				b "$line"
			fi
		done < "${1#@}"
	else
		b "${1}"
	fi
}

a "o"
a "aaa"
a "qweq"
a "@2as"

echo "$U"

@samueloph
Copy link
Collaborator Author

I'm leaving a few inline comments about possible enhancements.

All good suggestions, thank you.

I won't oppose if you think this use case is important enough to be implemented in wcurl, but my two cents is that I can't remember one single time when I needed to provide a list of URLs for wget/curl to download and I decided to use the tool instead of invoking things using for and xargs on shell. But maybe that's just me.

Yeah, for context; my flight got delayed due to the cyberattack and I wanted to see what an implementation would look like, it ended up simpler than I expected but I'm not sure yet about whether we should do it. This implementation is not equivalent to wget's -i as it doesn't support the form -i - to read from stdin, and I'm wondering if the fact that we do parallel downloads by default will be an issue to some users (due to throttling).

Thank you for the review!

 There are two ways of doing this now:
 1) wget way: -i, --input-file
 2) curl way: providing an URL argument starting with "@", wcurl will
    see "@filename" and download URLs from "filename".

 Lines starting with "#" inside input files are ignored.

 This is a continuation of #58

Co-authored-by: Sergio Durigan Junior <[email protected]>
@sergiodj
Copy link
Collaborator

I won't oppose if you think this use case is important enough to be implemented in wcurl, but my two cents is that I can't remember one single time when I needed to provide a list of URLs for wget/curl to download and I decided to use the tool instead of invoking things using for and xargs on shell. But maybe that's just me.

Yeah, for context; my flight got delayed due to the cyberattack and I wanted to see what an implementation would look like, it ended up simpler than I expected but I'm not sure yet about whether we should do it. This implementation is not equivalent to wget's -i as it doesn't support the form -i - to read from stdin, and I'm wondering if the fact that we do parallel downloads by default will be an issue to some users (due to throttling).

The issue with throttling has always existed, because it's always been possible to invoke wcurl with multiple URLs via the command line. So I wouldn't worry too much about that. It is well documented that wcurl will parallelize downloads; if there is a need, we can make this optional.

But I find it funny to see that there's a bit of a disconnect between what wcurl's goals are vs. what we find ourselves discussing sometimes :-). For example, it's OK to don't treat certain errors (e.g., the non-existing file provided via -i) and let the shell bail out for us, but we're also discussing more complex ways to deal with multi-URL downloads and parallelization. Just something that crossed my mind :-).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants