-
Notifications
You must be signed in to change notification settings - Fork 17
Introduce new options for downloading a list of URLs from file #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @samueloph.
I'm leaving a few inline comments about possible enhancements.
I won't oppose if you think this use case is important enough to be implemented in wcurl, but my two cents is that I can't remember one single time when I needed to provide a list of URLs for wget/curl to download and I decided to use the tool instead of invoking things using for
and xargs
on shell. But maybe that's just me.
;; | ||
|
||
--input-file=*) | ||
add_urls "@$(printf "%s\n" "${1}" | sed 's/^--input-file=//')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the program now takes an input file, I believe it's necessary to check if it exists and is readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hopping we could let the shell handle this:
./wcurl -i non_existent
./wcurl: 135: cannot open nonexistent: No such file
./wcurl -i not_allowed
./wcurl: 135: cannot open not_allowed: Permission denied
Is it better do do it ourselves instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on how "easy" we want wcurl to be. If our intention is to make it "user friendly", then I think it's worth checking if the file exists and throw a nice error (possibly with a suggestion to run --help
). Otherwise, I'm fine with relying on the shell to do error throwing for us :-).
# If the argument starts with "@", then it's an input file name. | ||
# When parsing an input file, ignore lines starting with "#". | ||
# This function also percent-encodes the whitespaces in URLs. | ||
add_urls() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote a version of add_urls
that uses printf
instead of case
, but I think this one is alright :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, is it simpler to read and understand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily. This is a back-of-the-napkin implementation to give you an idea:
set -x
U=""
b()
{
n=$(printf "%s\n" "${1}" | sed 's/ /%20/g')
U="${U} ${n}"
}
a()
{
if [ "$(printf '%c' "$1")" = "@" ]; then
while read -r line; do
if [ "$(printf '%c' "$line")" != '#' ]; then
b "$line"
fi
done < "${1#@}"
else
b "${1}"
fi
}
a "o"
a "aaa"
a "qweq"
a "@2as"
echo "$U"
b292543
to
b959f4e
Compare
All good suggestions, thank you.
Yeah, for context; my flight got delayed due to the cyberattack and I wanted to see what an implementation would look like, it ended up simpler than I expected but I'm not sure yet about whether we should do it. This implementation is not equivalent to wget's Thank you for the review! |
b959f4e
to
f36ff08
Compare
There are two ways of doing this now: 1) wget way: -i, --input-file 2) curl way: providing an URL argument starting with "@", wcurl will see "@filename" and download URLs from "filename". Lines starting with "#" inside input files are ignored. This is a continuation of #58 Co-authored-by: Sergio Durigan Junior <[email protected]>
f36ff08
to
48cef54
Compare
The issue with throttling has always existed, because it's always been possible to invoke wcurl with multiple URLs via the command line. So I wouldn't worry too much about that. It is well documented that wcurl will parallelize downloads; if there is a need, we can make this optional. But I find it funny to see that there's a bit of a disconnect between what wcurl's goals are vs. what we find ourselves discussing sometimes :-). For example, it's OK to don't treat certain errors (e.g., the non-existing file provided via |
There are two ways of doing this now:
see "@filename" and download URLs from "filename".
Lines starting with "#" inside input files are ignored.
This is a continuation of #58
I don't want to rush a change like this, I'm publishing the PR so that it gets more
visibility and I can verify the .md changes. This PR is still missing tests.
Overall we shouldn't try to implement every wget feature, this one might be
important enough to warrant the extra code.