Skip to content

Commit 21499ad

Browse files
authored
Bail out early for non-HTTP but HTTP looking protocols (#972)
* Add support in `Url` to parse all types of schemes * . * Guard handler against http looking protocol but not web or proxy requests * Fix condition for web server protocol detection * doc happy * Update flags and type check imports only
1 parent 552fb99 commit 21499ad

File tree

9 files changed

+155
-68
lines changed

9 files changed

+155
-68
lines changed

README.md

Lines changed: 59 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -2206,23 +2206,26 @@ To run standalone benchmark for `proxy.py`, use the following command from repo
22062206

22072207
```console
22082208
proxy -h
2209-
usage: -m [-h] [--enable-events] [--enable-conn-pool] [--threadless]
2210-
[--threaded] [--num-workers NUM_WORKERS]
2211-
[--local-executor LOCAL_EXECUTOR] [--backlog BACKLOG]
2212-
[--hostname HOSTNAME] [--port PORT] [--port-file PORT_FILE]
2213-
[--unix-socket-path UNIX_SOCKET_PATH]
2214-
[--num-acceptors NUM_ACCEPTORS] [--version] [--log-level LOG_LEVEL]
2215-
[--log-file LOG_FILE] [--log-format LOG_FORMAT]
2216-
[--open-file-limit OPEN_FILE_LIMIT]
2209+
usage: -m [-h] [--tunnel-hostname TUNNEL_HOSTNAME] [--tunnel-port TUNNEL_PORT]
2210+
[--tunnel-username TUNNEL_USERNAME]
2211+
[--tunnel-ssh-key TUNNEL_SSH_KEY]
2212+
[--tunnel-ssh-key-passphrase TUNNEL_SSH_KEY_PASSPHRASE]
2213+
[--tunnel-remote-port TUNNEL_REMOTE_PORT] [--enable-events]
2214+
[--threadless] [--threaded] [--num-workers NUM_WORKERS]
2215+
[--backlog BACKLOG] [--hostname HOSTNAME] [--port PORT]
2216+
[--port-file PORT_FILE] [--unix-socket-path UNIX_SOCKET_PATH]
2217+
[--local-executor LOCAL_EXECUTOR] [--num-acceptors NUM_ACCEPTORS]
2218+
[--version] [--log-level LOG_LEVEL] [--log-file LOG_FILE]
2219+
[--log-format LOG_FORMAT] [--open-file-limit OPEN_FILE_LIMIT]
22172220
[--plugins PLUGINS [PLUGINS ...]] [--enable-dashboard]
2218-
[--work-klass WORK_KLASS] [--pid-file PID_FILE]
2219-
[--enable-proxy-protocol]
2220-
[--client-recvbuf-size CLIENT_RECVBUF_SIZE] [--key-file KEY_FILE]
2221-
[--timeout TIMEOUT] [--server-recvbuf-size SERVER_RECVBUF_SIZE]
2222-
[--disable-http-proxy] [--disable-headers DISABLE_HEADERS]
2223-
[--ca-key-file CA_KEY_FILE] [--ca-cert-dir CA_CERT_DIR]
2224-
[--ca-cert-file CA_CERT_FILE] [--ca-file CA_FILE]
2225-
[--ca-signing-key-file CA_SIGNING_KEY_FILE] [--cert-file CERT_FILE]
2221+
[--enable-ssh-tunnel] [--work-klass WORK_KLASS]
2222+
[--pid-file PID_FILE] [--enable-conn-pool] [--key-file KEY_FILE]
2223+
[--cert-file CERT_FILE] [--client-recvbuf-size CLIENT_RECVBUF_SIZE]
2224+
[--server-recvbuf-size SERVER_RECVBUF_SIZE] [--timeout TIMEOUT]
2225+
[--enable-proxy-protocol] [--disable-http-proxy]
2226+
[--disable-headers DISABLE_HEADERS] [--ca-key-file CA_KEY_FILE]
2227+
[--ca-cert-dir CA_CERT_DIR] [--ca-cert-file CA_CERT_FILE]
2228+
[--ca-file CA_FILE] [--ca-signing-key-file CA_SIGNING_KEY_FILE]
22262229
[--auth-plugin AUTH_PLUGIN] [--basic-auth BASIC_AUTH]
22272230
[--cache-dir CACHE_DIR]
22282231
[--filtered-upstream-hosts FILTERED_UPSTREAM_HOSTS]
@@ -2235,15 +2238,28 @@ usage: -m [-h] [--enable-events] [--enable-conn-pool] [--threadless]
22352238
[--filtered-url-regex-config FILTERED_URL_REGEX_CONFIG]
22362239
[--cloudflare-dns-mode CLOUDFLARE_DNS_MODE]
22372240

2238-
proxy.py v2.4.0rc6.dev13+ga9b8034.d20220104
2241+
proxy.py v2.4.0rc7.dev12+gd234339.d20220116
22392242

22402243
options:
22412244
-h, --help show this help message and exit
2245+
--tunnel-hostname TUNNEL_HOSTNAME
2246+
Default: None. Remote hostname or IP address to which
2247+
SSH tunnel will be established.
2248+
--tunnel-port TUNNEL_PORT
2249+
Default: 22. SSH port of the remote host.
2250+
--tunnel-username TUNNEL_USERNAME
2251+
Default: None. Username to use for establishing SSH
2252+
tunnel.
2253+
--tunnel-ssh-key TUNNEL_SSH_KEY
2254+
Default: None. Private key path in pem format
2255+
--tunnel-ssh-key-passphrase TUNNEL_SSH_KEY_PASSPHRASE
2256+
Default: None. Private key passphrase
2257+
--tunnel-remote-port TUNNEL_REMOTE_PORT
2258+
Default: 8899. Remote port which will be forwarded
2259+
locally for proxy.
22422260
--enable-events Default: False. Enables core to dispatch lifecycle
22432261
events. Plugins can be used to subscribe for core
22442262
events.
2245-
--enable-conn-pool Default: False. (WIP) Enable upstream connection
2246-
pooling.
22472263
--threadless Default: True. Enabled by default on Python 3.8+ (mac,
22482264
linux). When disabled a new thread is spawned to
22492265
handle each client connection.
@@ -2252,14 +2268,6 @@ options:
22522268
handle each client connection.
22532269
--num-workers NUM_WORKERS
22542270
Defaults to number of CPU cores.
2255-
--local-executor LOCAL_EXECUTOR
2256-
Default: 1. Enabled by default. Use 0 to disable. When
2257-
enabled acceptors will make use of local (same
2258-
process) executor instead of distributing load across
2259-
remote (other process) executors. Enable this option
2260-
to achieve CPU affinity between acceptors and
2261-
executors, instead of using underlying OS kernel
2262-
scheduling algorithm.
22632271
--backlog BACKLOG Default: 100. Maximum number of pending connections to
22642272
proxy server
22652273
--hostname HOSTNAME Default: 127.0.0.1. Server IP address.
@@ -2270,6 +2278,14 @@ options:
22702278
--unix-socket-path UNIX_SOCKET_PATH
22712279
Default: None. Unix socket path to use. When provided
22722280
--host and --port flags are ignored
2281+
--local-executor LOCAL_EXECUTOR
2282+
Default: 1. Enabled by default. Use 0 to disable. When
2283+
enabled acceptors will make use of local (same
2284+
process) executor instead of distributing load across
2285+
remote (other process) executors. Enable this option
2286+
to achieve CPU affinity between acceptors and
2287+
executors, instead of using underlying OS kernel
2288+
scheduling algorithm.
22732289
--num-acceptors NUM_ACCEPTORS
22742290
Defaults to number of CPU cores.
22752291
--version, -v Prints proxy.py version.
@@ -2288,25 +2304,32 @@ options:
22882304
Comma separated plugins. You may use --plugins flag
22892305
multiple times.
22902306
--enable-dashboard Default: False. Enables proxy.py dashboard.
2307+
--enable-ssh-tunnel Default: False. Enable SSH tunnel.
22912308
--work-klass WORK_KLASS
22922309
Default: proxy.http.HttpProtocolHandler. Work klass to
22932310
use for work execution.
22942311
--pid-file PID_FILE Default: None. Save "parent" process ID to a file.
2295-
--enable-proxy-protocol
2296-
Default: False. If used, will enable proxy protocol.
2297-
Only version 1 is currently supported.
2298-
--client-recvbuf-size CLIENT_RECVBUF_SIZE
2299-
Default: 128 KB. Maximum amount of data received from
2300-
the client in a single recv() operation.
2312+
--enable-conn-pool Default: False. (WIP) Enable upstream connection
2313+
pooling.
23012314
--key-file KEY_FILE Default: None. Server key file to enable end-to-end
23022315
TLS encryption with clients. If used, must also pass
23032316
--cert-file.
2304-
--timeout TIMEOUT Default: 10.0. Number of seconds after which an
2305-
inactive connection must be dropped. Inactivity is
2306-
defined by no data sent or received by the client.
2317+
--cert-file CERT_FILE
2318+
Default: None. Server certificate to enable end-to-end
2319+
TLS encryption with clients. If used, must also pass
2320+
--key-file.
2321+
--client-recvbuf-size CLIENT_RECVBUF_SIZE
2322+
Default: 128 KB. Maximum amount of data received from
2323+
the client in a single recv() operation.
23072324
--server-recvbuf-size SERVER_RECVBUF_SIZE
23082325
Default: 128 KB. Maximum amount of data received from
23092326
the server in a single recv() operation.
2327+
--timeout TIMEOUT Default: 10.0. Number of seconds after which an
2328+
inactive connection must be dropped. Inactivity is
2329+
defined by no data sent or received by the client.
2330+
--enable-proxy-protocol
2331+
Default: False. If used, will enable proxy protocol.
2332+
Only version 1 is currently supported.
23102333
--disable-http-proxy Default: False. Whether to disable
23112334
proxy.HttpProxyPlugin.
23122335
--disable-headers DISABLE_HEADERS
@@ -2333,10 +2356,6 @@ options:
23332356
Default: None. CA signing key to use for dynamic
23342357
generation of HTTPS certificates. If used, must also
23352358
pass --ca-key-file and --ca-cert-file
2336-
--cert-file CERT_FILE
2337-
Default: None. Server certificate to enable end-to-end
2338-
TLS encryption with clients. If used, must also pass
2339-
--key-file.
23402359
--auth-plugin AUTH_PLUGIN
23412360
Default: proxy.http.proxy.AuthPlugin. Auth plugin to
23422361
use instead of default basic auth plugin.

proxy/common/flag.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -307,8 +307,8 @@ def initialize(
307307
# See https://github.com/abhinavsingh/proxy.py/pull/714 description
308308
# to understand rationale behind the following logic.
309309
#
310-
# --num-workers flag or option was found. We will use
311-
# the same value for num_acceptors when --num-acceptors flag
310+
# Num workers flag or option was found. We will use
311+
# the same value for num_acceptors when num acceptors flag
312312
# is absent.
313313
if num_workers != DEFAULT_NUM_WORKERS and num_acceptors == DEFAULT_NUM_ACCEPTORS:
314314
args.num_acceptors = args.num_workers

proxy/core/acceptor/pool.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,16 +21,17 @@
2121
from multiprocessing import connection
2222
from multiprocessing.reduction import send_handle
2323

24-
from typing import Any, List, Optional
24+
from typing import TYPE_CHECKING, Any, List, Optional
2525

2626
from .listener import Listener
2727
from .acceptor import Acceptor
2828

29-
from ..event import EventQueue
30-
3129
from ...common.flag import flags
3230
from ...common.constants import DEFAULT_NUM_ACCEPTORS
3331

32+
if TYPE_CHECKING: # pragma: no cover
33+
from ..event import EventQueue
34+
3435
logger = logging.getLogger(__name__)
3536

3637

@@ -69,7 +70,7 @@ def __init__(
6970
executor_queues: List[connection.Connection],
7071
executor_pids: List[int],
7172
executor_locks: List['multiprocessing.synchronize.Lock'],
72-
event_queue: Optional[EventQueue] = None,
73+
event_queue: Optional['EventQueue'] = None,
7374
) -> None:
7475
self.flags = flags
7576
# File descriptor to use for accepting new work
@@ -79,7 +80,7 @@ def __init__(
7980
self.executor_pids: List[int] = executor_pids
8081
self.executor_locks: List['multiprocessing.synchronize.Lock'] = executor_locks
8182
# Eventing core queue
82-
self.event_queue: Optional[EventQueue] = event_queue
83+
self.event_queue: Optional['EventQueue'] = event_queue
8384
# Acceptor process instances
8485
self.acceptors: List[Acceptor] = []
8586
# Fd queues used to share file descriptor with acceptor processes

proxy/http/handler.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
from ..common.types import Readables, SelectableEvents, Writables
2323
from ..common.constants import DEFAULT_SELECTOR_SELECT_TIMEOUT
2424

25+
from .protocols import httpProtocols
2526
from .connection import HttpClientConnection
2627
from .exception import HttpProtocolException
2728
from .plugin import HttpProtocolHandlerPlugin
@@ -260,6 +261,8 @@ def _initialize_plugin(
260261

261262
def _discover_plugin_klass(self, protocol: int) -> Optional[Type['HttpProtocolHandlerPlugin']]:
262263
"""Discovers and return matching HTTP handler plugin matching protocol."""
264+
if protocol == httpProtocols.UNKNOWN:
265+
return None
263266
if b'HttpProtocolHandlerPlugin' in self.flags.plugins:
264267
for klass in self.flags.plugins[b'HttpProtocolHandlerPlugin']:
265268
k: Type['HttpProtocolHandlerPlugin'] = klass

proxy/http/parser/parser.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
"""
1515
from typing import TypeVar, Optional, Dict, Type, Tuple, List
1616

17-
from ...common.constants import DEFAULT_DISABLE_HEADERS, COLON, DEFAULT_ENABLE_PROXY_PROTOCOL
17+
from ...common.constants import DEFAULT_DISABLE_HEADERS, COLON, DEFAULT_ENABLE_PROXY_PROTOCOL, HTTP_1_0
1818
from ...common.constants import HTTP_1_1, SLASH, CRLF
1919
from ...common.constants import WHITESPACE, DEFAULT_HTTP_PORT
2020
from ...common.utils import build_http_request, build_http_response, text_
@@ -157,7 +157,12 @@ def set_url(self, url: bytes) -> None:
157157
@property
158158
def http_handler_protocol(self) -> int:
159159
"""Returns `HttpProtocols` that this request belongs to."""
160-
return httpProtocols.HTTP_PROXY if self.host is not None else httpProtocols.WEB_SERVER
160+
if self.version in (HTTP_1_1, HTTP_1_0) and self._url is not None:
161+
if self.host is not None:
162+
return httpProtocols.HTTP_PROXY
163+
if self._url.hostname is None:
164+
return httpProtocols.WEB_SERVER
165+
return httpProtocols.UNKNOWN
161166

162167
@property
163168
def is_complete(self) -> bool:

proxy/http/protocols.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818

1919
HttpProtocols = NamedTuple(
2020
'HttpProtocols', [
21+
('UNKNOWN', int),
2122
# Web server handling HTTP/1.0, HTTP/1.1, HTTP/2, HTTP/3
2223
# over plain Text or encrypted connection with clients
2324
('WEB_SERVER', int),
@@ -30,4 +31,4 @@
3031
],
3132
)
3233

33-
httpProtocols = HttpProtocols(1, 2, 3)
34+
httpProtocols = HttpProtocols(1, 2, 3, 4)

proxy/http/url.py

Lines changed: 27 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"""
1616
from typing import Optional, Tuple
1717

18-
from ..common.constants import COLON, SLASH, HTTP_URL_PREFIX, HTTPS_URL_PREFIX, AT
18+
from ..common.constants import COLON, SLASH, AT
1919
from ..common.utils import text_
2020

2121

@@ -68,29 +68,41 @@ def from_bytes(cls, raw: bytes) -> 'Url':
6868
For a HTTPS connect tunnel, url is like ``httpbin.org:443``
6969
For a HTTP proxy request, url is like ``http://httpbin.org/get``
7070
71+
proxy.py internally never expects a https scheme in the request line.
72+
But `Url` class provides support for parsing any scheme present in the URLs.
73+
e.g. ftp, icap etc.
74+
75+
If a url with no scheme is parsed, e.g. ``//host/abc.js``, then scheme
76+
defaults to `http`.
77+
7178
Further:
7279
1) URL may contain unicode characters
7380
2) URL may contain IPv4 and IPv6 format addresses instead of domain names
74-
75-
We use heuristics based approach for our URL parser.
7681
"""
7782
# SLASH == 47, check if URL starts with single slash but not double slash
78-
is_single_slash = raw[0] == 47
79-
is_double_slash = is_single_slash and len(raw) >= 2 and raw[1] == 47
80-
if is_single_slash and not is_double_slash:
83+
starts_with_single_slash = raw[0] == 47
84+
starts_with_double_slash = starts_with_single_slash and \
85+
len(raw) >= 2 and \
86+
raw[1] == 47
87+
if starts_with_single_slash and \
88+
not starts_with_double_slash:
8189
return cls(remainder=raw)
82-
is_http = raw.startswith(HTTP_URL_PREFIX)
83-
is_https = raw.startswith(HTTPS_URL_PREFIX)
84-
if is_http or is_https or is_double_slash:
85-
rest = raw[len(b'https://'):] \
86-
if is_https \
87-
else raw[len(b'http://'):] \
88-
if is_http \
89-
else raw[len(SLASH + SLASH):]
90+
scheme = None
91+
rest = None
92+
if not starts_with_double_slash:
93+
# Find scheme
94+
parts = raw.split(b'://', 1)
95+
if len(parts) == 2:
96+
scheme = parts[0]
97+
rest = parts[1]
98+
else:
99+
rest = raw[len(SLASH + SLASH):]
100+
if scheme is not None or starts_with_double_slash:
101+
assert rest is not None
90102
parts = rest.split(SLASH, 1)
91103
username, password, host, port = Url._parse(parts[0])
92104
return cls(
93-
scheme=b'https' if is_https else b'http',
105+
scheme=scheme if not starts_with_double_slash else b'http',
94106
username=username,
95107
password=password,
96108
hostname=host,

tests/http/parser/test_http_parser.py

Lines changed: 40 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -678,9 +678,7 @@ def test_is_http_1_1_keep_alive(self) -> None:
678678
)
679679
self.assertTrue(self.parser.is_http_1_1_keep_alive)
680680

681-
def test_is_http_1_1_keep_alive_with_non_close_connection_header(
682-
self,
683-
) -> None:
681+
def test_is_http_1_1_keep_alive_with_non_close_connection_header(self) -> None:
684682
self.parser.parse(
685683
build_http_request(
686684
httpMethods.GET, b'/',
@@ -811,3 +809,42 @@ def test_is_safe_against_malicious_requests(self) -> None:
811809
b'//198.98.53.25:1389/TomcatBypass/Command/Base64d2dldCA0Ni4xNjEuNTIuMzcvRXhwbG9pd' +
812810
b'C5zaDsgY2htb2QgK3ggRXhwbG9pdC5zaDsgLi9FeHBsb2l0LnNoOw==}',
813811
)
812+
813+
def test_parses_icap_protocol(self) -> None:
814+
# Ref https://datatracker.ietf.org/doc/html/rfc3507
815+
self.parser.parse(
816+
b'REQMOD icap://icap-server.net/server?arg=87 ICAP/1.0\r\n' +
817+
b'Host: icap-server.net\r\n' +
818+
b'Encapsulated: req-hdr=0, req-body=154' +
819+
b'\r\n\r\n' +
820+
b'POST /origin-resource/form.pl HTTP/1.1\r\n' +
821+
b'Host: www.origin-server.com\r\n' +
822+
b'Accept: text/html, text/plain\r\n' +
823+
b'Accept-Encoding: compress\r\n' +
824+
b'Cache-Control: no-cache\r\n' +
825+
b'\r\n' +
826+
b'1e\r\n' +
827+
b'I am posting this information.\r\n' +
828+
b'0\r\n' +
829+
b'\r\n',
830+
)
831+
self.assertEqual(self.parser.method, b'REQMOD')
832+
assert self.parser._url is not None
833+
self.assertEqual(self.parser._url.scheme, b'icap')
834+
835+
def test_cannot_parse_sip_protocol(self) -> None:
836+
# Will fail to parse because of invalid host and port in the request line
837+
# Our Url parser expects an integer port.
838+
with self.assertRaises(ValueError):
839+
self.parser.parse(
840+
b'OPTIONS sip:nm SIP/2.0\r\n' +
841+
b'Via: SIP/2.0/TCP nm;branch=foo\r\n' +
842+
b'From: <sip:nm@nm>;tag=root\r\nTo: <sip:nm2@nm2>\r\n' +
843+
b'Call-ID: 50000\r\n' +
844+
b'CSeq: 42 OPTIONS\r\n' +
845+
b'Max-Forwards: 70\r\n' +
846+
b'Content-Length: 0\r\n' +
847+
b'Contact: <sip:nm@nm>\r\n' +
848+
b'Accept: application/sdp\r\n' +
849+
b'\r\n',
850+
)

tests/http/test_url.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,3 +143,12 @@ def test_no_scheme_suffix(self) -> None:
143143
self.assertEqual(url.remainder, b'/server?arg=87')
144144
self.assertEqual(url.username, None)
145145
self.assertEqual(url.password, None)
146+
147+
def test_any_scheme_suffix(self) -> None:
148+
url = Url.from_bytes(b'icap://example-server.net/server?arg=87')
149+
self.assertEqual(url.scheme, b'icap')
150+
self.assertEqual(url.hostname, b'example-server.net')
151+
self.assertEqual(url.port, None)
152+
self.assertEqual(url.remainder, b'/server?arg=87')
153+
self.assertEqual(url.username, None)
154+
self.assertEqual(url.password, None)

0 commit comments

Comments
 (0)