|
1 | | -## proxyhub |
| 1 | +*Porting to Python3.10+ is painful and the progress is moving slowly.* |
| 2 | +*We need more volunteers to join. PRs welcome! :joy:* |
| 3 | + |
| 4 | +ProxyHub |
| 5 | +=========== |
| 6 | + |
| 7 | +ProxyHub is an open source tool that asynchronously finds public proxies from multiple sources and concurrently checks them. |
| 8 | + |
| 9 | +Features |
| 10 | +-------- |
| 11 | + |
| 12 | +- Finds more than 7000 working proxies from \~50 sources. |
| 13 | +- Support protocols: HTTP(S), SOCKS4/5. Also CONNECT method to ports 80 and 23 (SMTP). |
| 14 | +- Proxies may be filtered by type, anonymity level, response time, country and status in DNSBL. |
| 15 | +- Work as a proxy server that distributes incoming requests to external proxies. With automatic proxy rotation. |
| 16 | +- All proxies are checked to support Cookies and Referer (and POST requests if required). |
| 17 | +- Automatically removes duplicate proxies. |
| 18 | +- Is asynchronous. |
| 19 | + |
| 20 | +Requirements |
| 21 | +------------ |
| 22 | + |
| 23 | +- Python 3.8+ |
| 24 | +- [aiohttp](https://pypi.python.org/pypi/aiohttp) |
| 25 | +- [aiodns](https://pypi.python.org/pypi/aiodns) |
| 26 | +- [maxminddb](https://pypi.python.org/pypi/maxminddb) |
| 27 | + |
| 28 | +Installation |
| 29 | +------------ |
| 30 | + |
| 31 | +### Install locally |
| 32 | + |
| 33 | +To install last stable release from pypi: |
| 34 | + |
| 35 | +``` {.sourceCode .bash} |
| 36 | +$ pip install proxyhub |
| 37 | +``` |
| 38 | + |
| 39 | +To install the latest development version from GitHub: |
| 40 | + |
| 41 | +``` {.sourceCode .bash} |
| 42 | +$ pip install -U git+https://github.com/ForceFledgling/proxyhub.git |
| 43 | +``` |
| 44 | + |
| 45 | +### Use pre-built Docker image |
| 46 | + |
| 47 | +``` {.sourceCode .bash} |
| 48 | +$ docker pull ForceFledgling/proxyhub |
| 49 | +``` |
| 50 | + |
| 51 | +### Build bundled one-file executable with pyinstaller |
| 52 | + |
| 53 | +#### Requirements |
| 54 | +Supported Operating System: Windows, Linux, MacOS |
| 55 | + |
| 56 | +*On UNIX-like systems (Linux / macOSX / BSD)* |
| 57 | + |
| 58 | +Install these tools |
| 59 | + - upx |
| 60 | + - objdump (this tool is usually in the binutils package) |
| 61 | +``` {.sourceCode .bash} |
| 62 | +$ sudo apt install -y upx-ucl binutils # On Ubuntu / Debian |
| 63 | +``` |
| 64 | + |
| 65 | +#### Build |
| 66 | + |
| 67 | +``` |
| 68 | +pip install pyinstaller \ |
| 69 | +&& pip install . \ |
| 70 | +&& mkdir -p build \ |
| 71 | +&& cd build \ |
| 72 | +&& pyinstaller --onefile --name proxyhub --add-data "../proxyhub/data:data" --workpath ./tmp --distpath . --clean ../py2exe_entrypoint.py \ |
| 73 | +&& rm -rf tmp *.spec |
| 74 | +``` |
| 75 | + |
| 76 | +The executable is now in the build directory |
| 77 | + |
| 78 | +Usage |
| 79 | +----- |
| 80 | + |
| 81 | +### CLI Examples |
| 82 | + |
| 83 | +#### Find |
| 84 | + |
| 85 | +Find and show 10 HTTP(S) proxies from United States with the high level of anonymity: |
| 86 | + |
| 87 | +``` {.sourceCode .bash} |
| 88 | +$ proxyhub find --types HTTP HTTPS --lvl High --countries US --strict -l 10 |
| 89 | +``` |
| 90 | + |
| 91 | + |
| 92 | + |
| 93 | +#### Grab |
| 94 | + |
| 95 | +Find and save to a file 10 US proxies (without a check): |
| 96 | + |
| 97 | +``` {.sourceCode .bash} |
| 98 | +$ proxyhub grab --countries US --limit 10 --outfile ./proxies.txt |
| 99 | +``` |
| 100 | + |
| 101 | + |
| 102 | + |
| 103 | +#### Serve |
| 104 | + |
| 105 | +Run a local proxy server that distributes incoming requests to a pool of found HTTP(S) proxies with the high level of anonymity: |
| 106 | + |
| 107 | +``` {.sourceCode .bash} |
| 108 | +$ proxyhub serve --host 127.0.0.1 --port 8888 --types HTTP HTTPS --lvl High --min-queue 5 |
| 109 | +``` |
| 110 | + |
| 111 | + |
| 112 | + |
| 113 | +Run `proxyhub --help` for more information on the options available. |
| 114 | +Run `proxyhub <command> --help` for more information on a command. |
| 115 | + |
| 116 | +### Basic code example |
| 117 | + |
| 118 | +Find and show 10 working HTTP(S) proxies: |
| 119 | + |
| 120 | +``` {.sourceCode .python} |
| 121 | +import asyncio |
| 122 | +from proxyhub import Broker |
| 123 | +
|
| 124 | +async def show(proxies): |
| 125 | + while True: |
| 126 | + proxy = await proxies.get() |
| 127 | + if proxy is None: break |
| 128 | + print('Found proxy: %s' % proxy) |
| 129 | +
|
| 130 | +proxies = asyncio.Queue() |
| 131 | +broker = Broker(proxies) |
| 132 | +tasks = asyncio.gather( |
| 133 | + broker.find(types=['HTTP', 'HTTPS'], limit=10), |
| 134 | + show(proxies)) |
| 135 | +
|
| 136 | +loop = asyncio.get_event_loop() |
| 137 | +loop.run_until_complete(tasks) |
| 138 | +``` |
| 139 | + |
| 140 | +[More examples](https://proxyhub.readthedocs.io/en/latest/examples.html). |
| 141 | + |
| 142 | +### Proxy information per requests |
| 143 | +#### HTTP |
| 144 | +Check `X-Proxy-Info` header in response. |
| 145 | +``` |
| 146 | +$ http_proxy=http://127.0.0.1:8888 https_proxy=http://127.0.0.1:8888 curl -v http://httpbin.org/get |
| 147 | +* Trying 127.0.0.1... |
| 148 | +* TCP_NODELAY set |
| 149 | +* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0) |
| 150 | +> GET http://httpbin.org/get HTTP/1.1 |
| 151 | +> Host: httpbin.org |
| 152 | +> User-Agent: curl/7.58.0 |
| 153 | +> Accept: */* |
| 154 | +> Proxy-Connection: Keep-Alive |
| 155 | +> |
| 156 | +< HTTP/1.1 200 OK |
| 157 | +< X-Proxy-Info: 174.138.42.112:8080 |
| 158 | +< Date: Mon, 04 May 2020 03:39:40 GMT |
| 159 | +< Content-Type: application/json |
| 160 | +< Content-Length: 304 |
| 161 | +< Server: gunicorn/19.9.0 |
| 162 | +< Access-Control-Allow-Origin: * |
| 163 | +< Access-Control-Allow-Credentials: true |
| 164 | +< X-Cache: MISS from ADM-MANAGER |
| 165 | +< X-Cache-Lookup: MISS from ADM-MANAGER:880 |
| 166 | +< Connection: keep-alive |
| 167 | +< |
| 168 | +{ |
| 169 | + "args": {}, |
| 170 | + "headers": { |
| 171 | + "Accept": "*/*", |
| 172 | + "Cache-Control": "max-age=259200", |
| 173 | + "Host": "httpbin.org", |
| 174 | + "User-Agent": "curl/7.58.0", |
| 175 | + "X-Amzn-Trace-Id": "Root=1-5eaf8e7c-6a1162a1387a1743a49063f4" |
| 176 | + }, |
| 177 | + "origin": "...", |
| 178 | + "url": "http://httpbin.org/get" |
| 179 | +} |
| 180 | +* Connection #0 to host 127.0.0.1 left intact |
| 181 | +``` |
| 182 | + |
| 183 | +#### HTTPS |
| 184 | +We are not able to modify HTTPS traffic to inject custom header once they start being encrypted. A `X-Proxy-Info` will be sent to client after `HTTP/1.1 200 Connection established` but not sure how clients can read it. |
| 185 | +``` |
| 186 | +(env) username@host:~/workspace/proxyhub2$ http_proxy=http://127.0.0.1:8888 https_proxy=http://127.0.0.1:8888 curl -v https://httpbin.org/get |
| 187 | +* Trying 127.0.0.1... |
| 188 | +* TCP_NODELAY set |
| 189 | +* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0) |
| 190 | +* allocate connect buffer! |
| 191 | +* Establish HTTP proxy tunnel to httpbin.org:443 |
| 192 | +> CONNECT httpbin.org:443 HTTP/1.1 |
| 193 | +> Host: httpbin.org:443 |
| 194 | +> User-Agent: curl/7.58.0 |
| 195 | +> Proxy-Connection: Keep-Alive |
| 196 | +> |
| 197 | +< HTTP/1.1 200 Connection established |
| 198 | +< X-Proxy-Info: 207.148.22.139:8080 |
| 199 | +< |
| 200 | +* Proxy replied 200 to CONNECT request |
| 201 | +* CONNECT phase completed! |
| 202 | +* ALPN, offering h2 |
| 203 | +* ALPN, offering http/1.1 |
| 204 | +* successfully set certificate verify locations: |
| 205 | +... |
| 206 | +* SSL certificate verify ok. |
| 207 | +* Using HTTP2, server supports multi-use |
| 208 | +* Connection state changed (HTTP/2 confirmed) |
| 209 | +* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0 |
| 210 | +* Using Stream ID: 1 (easy handle 0x5560b2e93580) |
| 211 | +> GET /get HTTP/2 |
| 212 | +> Host: httpbin.org |
| 213 | +> User-Agent: curl/7.58.0 |
| 214 | +> Accept: */* |
| 215 | +> |
| 216 | +* Connection state changed (MAX_CONCURRENT_STREAMS updated)! |
| 217 | +< HTTP/2 200 |
| 218 | +< date: Mon, 04 May 2020 03:39:35 GMT |
| 219 | +< content-type: application/json |
| 220 | +< content-length: 256 |
| 221 | +< server: gunicorn/19.9.0 |
| 222 | +< access-control-allow-origin: * |
| 223 | +< access-control-allow-credentials: true |
| 224 | +< |
| 225 | +{ |
| 226 | + "args": {}, |
| 227 | + "headers": { |
| 228 | + "Accept": "*/*", |
| 229 | + "Host": "httpbin.org", |
| 230 | + "User-Agent": "curl/7.58.0", |
| 231 | + "X-Amzn-Trace-Id": "Root=1-5eaf8e77-efcb353b0983ad6a90f8bdcd" |
| 232 | + }, |
| 233 | + "origin": "...", |
| 234 | + "url": "https://httpbin.org/get" |
| 235 | +} |
| 236 | +* Connection #0 to host 127.0.0.1 left intact |
| 237 | +``` |
| 238 | + |
| 239 | +### HTTP API |
| 240 | +#### Get info of proxy been used for retrieving specific url |
| 241 | +For HTTP, it's easy. |
| 242 | +``` |
| 243 | +$ http_proxy=http://127.0.0.1:8888 https_proxy=http://127.0.0.1:8888 curl -v http://proxycontrol/api/history/url:http://httpbin.org/get |
| 244 | +* Trying 127.0.0.1... |
| 245 | +* TCP_NODELAY set |
| 246 | +* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0) |
| 247 | +> GET http://proxycontrol/api/history/url:http://httpbin.org/get HTTP/1.1 |
| 248 | +> Host: proxycontrol |
| 249 | +> User-Agent: curl/7.58.0 |
| 250 | +> Accept: */* |
| 251 | +> Proxy-Connection: Keep-Alive |
| 252 | +> |
| 253 | +< HTTP/1.1 200 OK |
| 254 | +< Content-Type: application/json |
| 255 | +< Content-Length: 34 |
| 256 | +< Access-Control-Allow-Origin: * |
| 257 | +< Access-Control-Allow-Credentials: true |
| 258 | +< |
| 259 | +{"proxy": "..."} |
| 260 | +``` |
| 261 | + |
| 262 | +For HTTPS, we're not able to know encrypted payload (request), so only hostname can be used. |
| 263 | +``` |
| 264 | +$ http_proxy=http://127.0.0.1:8888 https_proxy=http://127.0.0.1:8888 curl -v http://proxycontrol/api/history/url:httpbin.org:443 |
| 265 | +* Trying 127.0.0.1... |
| 266 | +* TCP_NODELAY set |
| 267 | +* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0) |
| 268 | +> GET http://proxycontrol/api/history/url:httpbin.org:443 HTTP/1.1 |
| 269 | +> Host: proxycontrol |
| 270 | +> User-Agent: curl/7.58.0 |
| 271 | +> Accept: */* |
| 272 | +> Proxy-Connection: Keep-Alive |
| 273 | +> |
| 274 | +< HTTP/1.1 200 OK |
| 275 | +< Content-Type: application/json |
| 276 | +< Content-Length: 34 |
| 277 | +< Access-Control-Allow-Origin: * |
| 278 | +< Access-Control-Allow-Credentials: true |
| 279 | +< |
| 280 | +{"proxy": "..."} |
| 281 | +* Connection #0 to host 127.0.0.1 left intact |
| 282 | +``` |
| 283 | + |
| 284 | +#### Remove specific proxy from queue |
| 285 | +``` |
| 286 | +$ http_proxy=http://127.0.0.1:8888 https_proxy=http://127.0.0.1:8888 curl -v http://proxycontrol/api/remove/PROXY_IP:PROXY_PORT |
| 287 | +* Trying 127.0.0.1... |
| 288 | +* TCP_NODELAY set |
| 289 | +* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0) |
| 290 | +> GET http://proxycontrol/api/remove/... HTTP/1.1 |
| 291 | +> Host: proxycontrol |
| 292 | +> User-Agent: curl/7.58.0 |
| 293 | +> Accept: */* |
| 294 | +> Proxy-Connection: Keep-Alive |
| 295 | +> |
| 296 | +< HTTP/1.1 204 No Content |
| 297 | +< |
| 298 | +* Connection #0 to host 127.0.0.1 left intact |
| 299 | +``` |
| 300 | + |
| 301 | +Documentation |
| 302 | +------------- |
| 303 | + |
| 304 | +<https://proxyhub.readthedocs.io/> |
| 305 | + |
| 306 | +TODO |
| 307 | +---- |
| 308 | + |
| 309 | +- Check the ping, response time and speed of data transfer |
| 310 | +- Check site access (Google, Twitter, etc) and even your own custom URL's |
| 311 | +- Information about uptime |
| 312 | +- Checksum of data returned |
| 313 | +- Support for proxy authentication |
| 314 | +- Finding outgoing IP for cascading proxy |
| 315 | +- The ability to specify the address of the proxy without port (try to connect on defaulted ports) |
| 316 | + |
| 317 | +Contributing |
| 318 | +------------ |
| 319 | + |
| 320 | +- Fork it: <https://github.com/ForceFledgling/proxyhub/fork> |
| 321 | +- Create your feature branch: `git checkout -b my-new-feature` |
| 322 | +- We use [Poetry](https://python-poetry.org/) to manage dependencies. If need, install dependencies: `poetry install` |
| 323 | +- Commit your changes: `git commit -am 'Add some feature'` |
| 324 | +- Push to the branch: `git push origin my-new-feature` |
| 325 | +- Submit a pull request! |
| 326 | +- [Contributor workflow](https://github.com/ForceFledgling/proxyhub/issues/) |
| 327 | + |
| 328 | +License |
| 329 | +------- |
| 330 | + |
| 331 | +Licensed under the Apache License, Version 2.0 |
| 332 | + |
| 333 | +*This product includes GeoLite2 data created by MaxMind, available from* [<http://www.maxmind.com>](http://www.maxmind.com). |
| 334 | + |
| 335 | +## Contributors ✨ |
| 336 | + |
| 337 | +<!-- markdownlint-restore --> |
| 338 | +<!-- prettier-ignore-end --> |
| 339 | + |
| 340 | +<!-- ALL-CONTRIBUTORS-LIST:END --> |
| 341 | + |
| 342 | +This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome! |
0 commit comments