Skip to content

applying backoff for rate-limited Maven downloads #100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
26 changes: 23 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,20 @@

import os
import shutil
from time import sleep

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like they have the imports organized a bit differently. Perhaps keep "import x" at the top and move the "from x import y" in the next section...all alphabetically, of course.


from setuptools import Command
from setuptools import setup
from setuptools.command.install import install

if sys.version_info[0] >= 3:
# Python 3
from urllib.request import urlopen
from urllib.error import HTTPError
else:
# Python 2
from urllib2 import urlopen
from urllib2 import HTTPError

#
# This script modifies the basic setuptools by adding some functionality to the standard
Expand All @@ -50,7 +54,7 @@

PACKAGE_NAME = 'amazon_kclpy'
JAR_DIRECTORY = os.path.join(PACKAGE_NAME, 'jars')
PACKAGE_VERSION = '2.0.1'
PACKAGE_VERSION = '2.0.5'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the current version is 2.0.1, should this be changed to 2.0.2?

PYTHON_REQUIREMENTS = [
'boto',
# argparse is part of python2.7 but must be declared for python2.6
Expand Down Expand Up @@ -121,6 +125,7 @@
('commons-beanutils', 'commons-beanutils', '1.9.3'),
('commons-collections', 'commons-collections', '3.2.2')
]
MAX_URL_DOWNLOAD_ATTEMPTS = 5


class MavenJarDownloader:
Expand Down Expand Up @@ -179,9 +184,9 @@ def download_file(self, url, dest):
"""
Downloads a file at the url to the destination.
"""
print('Attempting to retrieve remote jar {url}'.format(url=url))
try:
response = urlopen(url)
response = self.make_request_with_backoff(url)

with open(dest, 'wb') as dest_file:
shutil.copyfileobj(response, dest_file)
print('Saving {url} -> {dest}'.format(url=url, dest=dest))
Expand All @@ -198,6 +203,21 @@ def download_files(self):
url = self.package_url(package[0], package[1], package[2])
self.download_file(url, dest)

def make_request_with_backoff(self, url):
for attempt_number in range(MAX_URL_DOWNLOAD_ATTEMPTS):
print('Attempting to retrieve remote jar {url}'.format(url=url))
try:
return urlopen(url)
except HTTPError as e:
if e.code == 429:
Copy link

@whummer whummer Aug 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtfalkenstein Can we change this to perform retries also for other error codes?

It seems that Maven is not consistently using 429, but also other error codes. Recently, we've seen 502 (or 504) errors returned from the gateway, but these errors appear to be related to client request throttling.

Including other error codes here would make the installation much more resilient. Thanks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird. You know, we've been running this install via an automated Jenkins build a lot and haven't encountered anything other than a 429.

I'm inclined to do as you request on this, but I do have a concern on this... Any time we repeatedly pound a server with requests that we're getting error responses for, we risk being blacklisted by the server admins.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Good to know that you haven't had any issues recently with the 429 fix. Merging this PR as-is would already be a huge improvement over the status quo. :) Thanks

sleep_time = 2 ** attempt_number
print('"429 Too Many Requests" response received. Sleeping {} seconds and trying again.'.format(sleep_time))
sleep(sleep_time)
else:
raise

raise Exception('"429 Too Many Requests" responses received.')


class DownloadJarsCommand(Command):
description = "Download the jar files needed to run the sample application"
Expand Down