Skip to content

Commit 8d99e5c

Browse files
Initial commit
1 parent b8430a7 commit 8d99e5c

File tree

1 file changed

+342
-1
lines changed

1 file changed

+342
-1
lines changed

README.md

Lines changed: 342 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,342 @@
1-
## proxyhub
1+
*Porting to Python3.10+ is painful and the progress is moving slowly.*
2+
*We need more volunteers to join. PRs welcome! :joy:*
3+
4+
ProxyHub
5+
===========
6+
7+
ProxyHub is an open source tool that asynchronously finds public proxies from multiple sources and concurrently checks them.
8+
9+
Features
10+
--------
11+
12+
- Finds more than 7000 working proxies from \~50 sources.
13+
- Support protocols: HTTP(S), SOCKS4/5. Also CONNECT method to ports 80 and 23 (SMTP).
14+
- Proxies may be filtered by type, anonymity level, response time, country and status in DNSBL.
15+
- Work as a proxy server that distributes incoming requests to external proxies. With automatic proxy rotation.
16+
- All proxies are checked to support Cookies and Referer (and POST requests if required).
17+
- Automatically removes duplicate proxies.
18+
- Is asynchronous.
19+
20+
Requirements
21+
------------
22+
23+
- Python 3.8+
24+
- [aiohttp](https://pypi.python.org/pypi/aiohttp)
25+
- [aiodns](https://pypi.python.org/pypi/aiodns)
26+
- [maxminddb](https://pypi.python.org/pypi/maxminddb)
27+
28+
Installation
29+
------------
30+
31+
### Install locally
32+
33+
To install last stable release from pypi:
34+
35+
``` {.sourceCode .bash}
36+
$ pip install proxyhub
37+
```
38+
39+
To install the latest development version from GitHub:
40+
41+
``` {.sourceCode .bash}
42+
$ pip install -U git+https://github.com/ForceFledgling/proxyhub.git
43+
```
44+
45+
### Use pre-built Docker image
46+
47+
``` {.sourceCode .bash}
48+
$ docker pull ForceFledgling/proxyhub
49+
```
50+
51+
### Build bundled one-file executable with pyinstaller
52+
53+
#### Requirements
54+
Supported Operating System: Windows, Linux, MacOS
55+
56+
*On UNIX-like systems (Linux / macOSX / BSD)*
57+
58+
Install these tools
59+
- upx
60+
- objdump (this tool is usually in the binutils package)
61+
``` {.sourceCode .bash}
62+
$ sudo apt install -y upx-ucl binutils # On Ubuntu / Debian
63+
```
64+
65+
#### Build
66+
67+
```
68+
pip install pyinstaller \
69+
&& pip install . \
70+
&& mkdir -p build \
71+
&& cd build \
72+
&& pyinstaller --onefile --name proxyhub --add-data "../proxyhub/data:data" --workpath ./tmp --distpath . --clean ../py2exe_entrypoint.py \
73+
&& rm -rf tmp *.spec
74+
```
75+
76+
The executable is now in the build directory
77+
78+
Usage
79+
-----
80+
81+
### CLI Examples
82+
83+
#### Find
84+
85+
Find and show 10 HTTP(S) proxies from United States with the high level of anonymity:
86+
87+
``` {.sourceCode .bash}
88+
$ proxyhub find --types HTTP HTTPS --lvl High --countries US --strict -l 10
89+
```
90+
91+
![image](https://raw.githubusercontent.com/constverum/proxyhub/master/docs/source/_static/cli_find_example.gif)
92+
93+
#### Grab
94+
95+
Find and save to a file 10 US proxies (without a check):
96+
97+
``` {.sourceCode .bash}
98+
$ proxyhub grab --countries US --limit 10 --outfile ./proxies.txt
99+
```
100+
101+
![image](https://raw.githubusercontent.com/constverum/proxyhub/master/docs/source/_static/cli_grab_example.gif)
102+
103+
#### Serve
104+
105+
Run a local proxy server that distributes incoming requests to a pool of found HTTP(S) proxies with the high level of anonymity:
106+
107+
``` {.sourceCode .bash}
108+
$ proxyhub serve --host 127.0.0.1 --port 8888 --types HTTP HTTPS --lvl High --min-queue 5
109+
```
110+
111+
![image](https://raw.githubusercontent.com/constverum/proxyhub/master/docs/source/_static/cli_serve_example.gif)
112+
113+
Run `proxyhub --help` for more information on the options available.
114+
Run `proxyhub <command> --help` for more information on a command.
115+
116+
### Basic code example
117+
118+
Find and show 10 working HTTP(S) proxies:
119+
120+
``` {.sourceCode .python}
121+
import asyncio
122+
from proxyhub import Broker
123+
124+
async def show(proxies):
125+
while True:
126+
proxy = await proxies.get()
127+
if proxy is None: break
128+
print('Found proxy: %s' % proxy)
129+
130+
proxies = asyncio.Queue()
131+
broker = Broker(proxies)
132+
tasks = asyncio.gather(
133+
broker.find(types=['HTTP', 'HTTPS'], limit=10),
134+
show(proxies))
135+
136+
loop = asyncio.get_event_loop()
137+
loop.run_until_complete(tasks)
138+
```
139+
140+
[More examples](https://proxyhub.readthedocs.io/en/latest/examples.html).
141+
142+
### Proxy information per requests
143+
#### HTTP
144+
Check `X-Proxy-Info` header in response.
145+
```
146+
$ http_proxy=http://127.0.0.1:8888 https_proxy=http://127.0.0.1:8888 curl -v http://httpbin.org/get
147+
* Trying 127.0.0.1...
148+
* TCP_NODELAY set
149+
* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0)
150+
> GET http://httpbin.org/get HTTP/1.1
151+
> Host: httpbin.org
152+
> User-Agent: curl/7.58.0
153+
> Accept: */*
154+
> Proxy-Connection: Keep-Alive
155+
>
156+
< HTTP/1.1 200 OK
157+
< X-Proxy-Info: 174.138.42.112:8080
158+
< Date: Mon, 04 May 2020 03:39:40 GMT
159+
< Content-Type: application/json
160+
< Content-Length: 304
161+
< Server: gunicorn/19.9.0
162+
< Access-Control-Allow-Origin: *
163+
< Access-Control-Allow-Credentials: true
164+
< X-Cache: MISS from ADM-MANAGER
165+
< X-Cache-Lookup: MISS from ADM-MANAGER:880
166+
< Connection: keep-alive
167+
<
168+
{
169+
"args": {},
170+
"headers": {
171+
"Accept": "*/*",
172+
"Cache-Control": "max-age=259200",
173+
"Host": "httpbin.org",
174+
"User-Agent": "curl/7.58.0",
175+
"X-Amzn-Trace-Id": "Root=1-5eaf8e7c-6a1162a1387a1743a49063f4"
176+
},
177+
"origin": "...",
178+
"url": "http://httpbin.org/get"
179+
}
180+
* Connection #0 to host 127.0.0.1 left intact
181+
```
182+
183+
#### HTTPS
184+
We are not able to modify HTTPS traffic to inject custom header once they start being encrypted. A `X-Proxy-Info` will be sent to client after `HTTP/1.1 200 Connection established` but not sure how clients can read it.
185+
```
186+
(env) username@host:~/workspace/proxyhub2$ http_proxy=http://127.0.0.1:8888 https_proxy=http://127.0.0.1:8888 curl -v https://httpbin.org/get
187+
* Trying 127.0.0.1...
188+
* TCP_NODELAY set
189+
* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0)
190+
* allocate connect buffer!
191+
* Establish HTTP proxy tunnel to httpbin.org:443
192+
> CONNECT httpbin.org:443 HTTP/1.1
193+
> Host: httpbin.org:443
194+
> User-Agent: curl/7.58.0
195+
> Proxy-Connection: Keep-Alive
196+
>
197+
< HTTP/1.1 200 Connection established
198+
< X-Proxy-Info: 207.148.22.139:8080
199+
<
200+
* Proxy replied 200 to CONNECT request
201+
* CONNECT phase completed!
202+
* ALPN, offering h2
203+
* ALPN, offering http/1.1
204+
* successfully set certificate verify locations:
205+
...
206+
* SSL certificate verify ok.
207+
* Using HTTP2, server supports multi-use
208+
* Connection state changed (HTTP/2 confirmed)
209+
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
210+
* Using Stream ID: 1 (easy handle 0x5560b2e93580)
211+
> GET /get HTTP/2
212+
> Host: httpbin.org
213+
> User-Agent: curl/7.58.0
214+
> Accept: */*
215+
>
216+
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
217+
< HTTP/2 200
218+
< date: Mon, 04 May 2020 03:39:35 GMT
219+
< content-type: application/json
220+
< content-length: 256
221+
< server: gunicorn/19.9.0
222+
< access-control-allow-origin: *
223+
< access-control-allow-credentials: true
224+
<
225+
{
226+
"args": {},
227+
"headers": {
228+
"Accept": "*/*",
229+
"Host": "httpbin.org",
230+
"User-Agent": "curl/7.58.0",
231+
"X-Amzn-Trace-Id": "Root=1-5eaf8e77-efcb353b0983ad6a90f8bdcd"
232+
},
233+
"origin": "...",
234+
"url": "https://httpbin.org/get"
235+
}
236+
* Connection #0 to host 127.0.0.1 left intact
237+
```
238+
239+
### HTTP API
240+
#### Get info of proxy been used for retrieving specific url
241+
For HTTP, it's easy.
242+
```
243+
$ http_proxy=http://127.0.0.1:8888 https_proxy=http://127.0.0.1:8888 curl -v http://proxycontrol/api/history/url:http://httpbin.org/get
244+
* Trying 127.0.0.1...
245+
* TCP_NODELAY set
246+
* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0)
247+
> GET http://proxycontrol/api/history/url:http://httpbin.org/get HTTP/1.1
248+
> Host: proxycontrol
249+
> User-Agent: curl/7.58.0
250+
> Accept: */*
251+
> Proxy-Connection: Keep-Alive
252+
>
253+
< HTTP/1.1 200 OK
254+
< Content-Type: application/json
255+
< Content-Length: 34
256+
< Access-Control-Allow-Origin: *
257+
< Access-Control-Allow-Credentials: true
258+
<
259+
{"proxy": "..."}
260+
```
261+
262+
For HTTPS, we're not able to know encrypted payload (request), so only hostname can be used.
263+
```
264+
$ http_proxy=http://127.0.0.1:8888 https_proxy=http://127.0.0.1:8888 curl -v http://proxycontrol/api/history/url:httpbin.org:443
265+
* Trying 127.0.0.1...
266+
* TCP_NODELAY set
267+
* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0)
268+
> GET http://proxycontrol/api/history/url:httpbin.org:443 HTTP/1.1
269+
> Host: proxycontrol
270+
> User-Agent: curl/7.58.0
271+
> Accept: */*
272+
> Proxy-Connection: Keep-Alive
273+
>
274+
< HTTP/1.1 200 OK
275+
< Content-Type: application/json
276+
< Content-Length: 34
277+
< Access-Control-Allow-Origin: *
278+
< Access-Control-Allow-Credentials: true
279+
<
280+
{"proxy": "..."}
281+
* Connection #0 to host 127.0.0.1 left intact
282+
```
283+
284+
#### Remove specific proxy from queue
285+
```
286+
$ http_proxy=http://127.0.0.1:8888 https_proxy=http://127.0.0.1:8888 curl -v http://proxycontrol/api/remove/PROXY_IP:PROXY_PORT
287+
* Trying 127.0.0.1...
288+
* TCP_NODELAY set
289+
* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0)
290+
> GET http://proxycontrol/api/remove/... HTTP/1.1
291+
> Host: proxycontrol
292+
> User-Agent: curl/7.58.0
293+
> Accept: */*
294+
> Proxy-Connection: Keep-Alive
295+
>
296+
< HTTP/1.1 204 No Content
297+
<
298+
* Connection #0 to host 127.0.0.1 left intact
299+
```
300+
301+
Documentation
302+
-------------
303+
304+
<https://proxyhub.readthedocs.io/>
305+
306+
TODO
307+
----
308+
309+
- Check the ping, response time and speed of data transfer
310+
- Check site access (Google, Twitter, etc) and even your own custom URL's
311+
- Information about uptime
312+
- Checksum of data returned
313+
- Support for proxy authentication
314+
- Finding outgoing IP for cascading proxy
315+
- The ability to specify the address of the proxy without port (try to connect on defaulted ports)
316+
317+
Contributing
318+
------------
319+
320+
- Fork it: <https://github.com/ForceFledgling/proxyhub/fork>
321+
- Create your feature branch: `git checkout -b my-new-feature`
322+
- We use [Poetry](https://python-poetry.org/) to manage dependencies. If need, install dependencies: `poetry install`
323+
- Commit your changes: `git commit -am 'Add some feature'`
324+
- Push to the branch: `git push origin my-new-feature`
325+
- Submit a pull request!
326+
- [Contributor workflow](https://github.com/ForceFledgling/proxyhub/issues/)
327+
328+
License
329+
-------
330+
331+
Licensed under the Apache License, Version 2.0
332+
333+
*This product includes GeoLite2 data created by MaxMind, available from* [<http://www.maxmind.com>](http://www.maxmind.com).
334+
335+
## Contributors ✨
336+
337+
<!-- markdownlint-restore -->
338+
<!-- prettier-ignore-end -->
339+
340+
<!-- ALL-CONTRIBUTORS-LIST:END -->
341+
342+
This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!

0 commit comments

Comments
 (0)