Running tests against multiple versions of Python seems to make the US Census geocoder more likely to return errors, as discovered in #64 , possibly due to a high number of requests arriving in a short period. We should find a way to be able to test multiple Python versions without an increased rate of failures.