Skip to content

Commit 9678ffa

Browse files
committed
UserAgent in parameters
1 parent e6707a9 commit 9678ffa

File tree

4 files changed

+19
-3
lines changed

4 files changed

+19
-3
lines changed

README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ It will download the last version of every file present on Wayback Machine to `.
4747
-p, --maximum-snapshot NUMBER Maximum snapshot pages to consider (Default is 100)
4848
Count an average of 150,000 snapshots per page
4949
-l, --list Only list file urls in a JSON format with the archived timestamps, won't download anything
50+
-u, --user-agent STRING UserAgent for connection (Default is WayBack Machine Downloader)
5051
5152
## Specify directory to save files to
5253

@@ -175,6 +176,16 @@ Example:
175176

176177
wayback_machine_downloader http://example.com --concurrency 20
177178

179+
## Specify UserAgent for connection
180+
181+
-u, --user-agent STRING
182+
183+
UserAgent for connection (Default is WayBack Machine Downloader)
184+
185+
Example:
186+
187+
wayback_machine_downloader http://example.com --user-agent "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:77.0) Gecko/20190101 Firefox/77.0"
188+
178189
## Using the Docker image
179190

180191
As an alternative installation way, we have a Docker image! Retrieve the wayback-machine-downloader Docker image this way:

bin/wayback_machine_downloader

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,10 @@ option_parser = OptionParser.new do |opts|
5858
options[:list] = true
5959
end
6060

61+
opts.on("-u", "--user-agent STRING", String, "UserAgent for connection (Default is WayBack Machine Downloader)") do |t|
62+
options[:user_agent] = t
63+
end
64+
6165
opts.on("-v", "--version", "Display version") do |t|
6266
options[:version] = t
6367
end

lib/wayback_machine_downloader.rb

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ class WaybackMachineDownloader
1818

1919
attr_accessor :base_url, :exact_url, :directory, :all_timestamps,
2020
:from_timestamp, :to_timestamp, :only_filter, :exclude_filter,
21-
:all, :maximum_pages, :threads_count
21+
:all, :maximum_pages, :threads_count, :user_agent
2222

2323
def initialize params
2424
@base_url = params[:base_url]
@@ -32,6 +32,7 @@ def initialize params
3232
@all = params[:all]
3333
@maximum_pages = params[:maximum_pages] ? params[:maximum_pages].to_i : 100
3434
@threads_count = params[:threads_count].to_i
35+
@user_agent = params[:user_agent] ? params[:user_agent] : "WayBack Machine Downloader"
3536
end
3637

3738
def backup_name
@@ -268,7 +269,7 @@ def download_file file_remote_info
268269
structure_dir_path dir_path
269270
open(file_path, "wb") do |file|
270271
begin
271-
URI.open("https://web.archive.org/web/#{file_timestamp}id_/#{file_url}", "Accept-Encoding" => "plain") do |uri|
272+
open("http://web.archive.org/web/#{file_timestamp}id_/#{file_url}", "Accept-Encoding" => "plain", "User-Agent" => @user_agent) do |uri|
272273
file.write(uri.read)
273274
end
274275
rescue OpenURI::HTTPError => e

lib/wayback_machine_downloader/archive_api.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ def get_raw_list_from_api url, page_index
55
request_url += url
66
request_url += parameters_for_api page_index
77

8-
URI.open(request_url).read
8+
open(request_url, "User-Agent" => @user_agent).read
99
end
1010

1111
def parameters_for_api page_index

0 commit comments

Comments
 (0)