Skip to content

Commit 7b94b3b

Browse files
committed
release v1.0.0, tested & improved
1 parent 1ac8153 commit 7b94b3b

File tree

11 files changed

+190
-54
lines changed

11 files changed

+190
-54
lines changed
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
name: Build and Release Python Project
2+
3+
on:
4+
push:
5+
tags:
6+
- 'v*' # Triggers on version tags
7+
8+
jobs:
9+
build-and-release:
10+
runs-on: ubuntu-latest
11+
12+
steps:
13+
# Step 1: Checkout the repository
14+
- name: Checkout Code
15+
uses: actions/checkout@v4
16+
17+
# Step 2: Set up Python
18+
- name: Set up Python
19+
uses: actions/setup-python@v4
20+
with:
21+
python-version: '3.9' # Use your desired Python version
22+
23+
# Step 3: Install build dependencies
24+
- name: Install build tools
25+
run: pip install build --upgrade
26+
27+
# Step 4: Build the project
28+
- name: Build the Python Project
29+
run: python -m build
30+
31+
# Step 5: Create a GitHub Release
32+
- name: Create GitHub Release
33+
uses: softprops/action-gh-release@v2
34+
with:
35+
# Upload all artifacts in the dist/ folder
36+
files: |
37+
dist/*

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,5 @@ __pycache__
55
*.pyd
66
.venv
77
*.egg-info
8-
dist
8+
dist
9+
other

MANIFEST.in

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
exclude tests/*
22
exclude .pytest_cache
3-
exclude .venv
3+
exclude .venv
4+
exclude other/*

README.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,17 @@
11
# Curl Adapter
2-
A module that plugs straight-in to the python *[requests](https://github.com/psf/requests)* library and replaces the default *urllib3* HTTP adapter with cURL.
2+
![PyPI - Downloads](https://img.shields.io/pypi/dw/curl-adapter)
3+
4+
A module that plugs straight-in to the python *[requests](https://github.com/psf/requests)* library and replaces the default *urllib3* HTTP adapter with **cURL**, equipped with TLS fingerprint changing capabilities.
35

46
## Why?
57

68
Specifically, this module is meant to be used with the "curl impersonate" python bindings ([lexiforest/curl_cffi](https://github.com/lexiforest/curl_cffi)), in order to send HTTP requests with custom, browser-like TLS & HTTP/2 fingerprints for bypassing sites that detect and block normal python requests (such as [Cloudflare](https://www.nstbrowser.io/en/blog/how-does-cloudflare-detect-bots) for example).
9+
710
<details>
811
<summary>Note</summary>
912
Even though <i><a href="https://github.com/lexiforest/curl_cffi">curl_cffi</a></i> already has an API that *mimicks* the <i>requests</i> library, it comes with some compatibility issues (e.g. response.raw not available, response.history, differences in headers, cookies, json, etc.).
1013
<br><br>
11-
With curl adapter, instead of copying and mimicking the <i>requests</i> library API, just the low level HTTP adapter is changed, and everything else is exactly the same (even the exceptions).
14+
With curl adapter, instead of copying and mimicking the <i>requests</i> library API, the low level HTTP adapter is changed with a custom crafted one, and everything else is exactly the same (even the exceptions are mapped).
1215
<br><br>
1316
With a single switch you can enable/disable curl for your requests, without needing to worry about changing the way you normally work with requests.
1417
<br><br>
@@ -63,7 +66,7 @@ with requests.Session() as s:
6366
```
6467

6568
## More
66-
You can get extra information from curl response info:
69+
You can get extra information from the curl response info:
6770
```python
6871
import requests
6972
from curl_adapter import PyCurlAdapter, CurlInfo

curl_adapter/base_adapter.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -347,7 +347,7 @@ def build_response(self, curl: typing.Union[curl_cffi.Curl, pycurl.Curl], res:Cu
347347
response.encoding = get_encoding_from_headers(response.headers)
348348
response.raw = res
349349

350-
response.reason = parsed_headers["headers"]
350+
response.reason = parsed_headers["reason"]
351351

352352
response.get_curl_info = get_curl_info
353353

@@ -481,8 +481,6 @@ def set_curl_options(self,
481481

482482
# files
483483
#already handled
484-
# multipart
485-
#already handled
486484

487485
# auth
488486
#already handled, it's just a header...

curl_adapter/curl_cffi.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ def get_curl_info(self, curl: curl_cffi.Curl, option_code: int):
4747
"""
4848
Currently, curl_cfii doesn't work for retriving information like TOTAL_TIME_T, SPEED_DOWNLOAD_T,
4949
because they haven't mapped the all option codes. (These options start at 0x600000 int64_t, but curl_cfii maps only up to 0x400000...)
50+
I made a pull request to fix it: https://github.com/lexiforest/curl_cffi/pull/481 (but as of now it's not merged yet)
5051
"""
5152
c_value = ffi.new("int64_t*")
5253
value = lib.curl_easy_getinfo(curl._curl, option_code, c_value)
@@ -59,7 +60,9 @@ def get_curl_info(self, curl: curl_cffi.Curl, option_code: int):
5960

6061
def set_ja3_options(self, curl: curl_cffi.Curl, ja3: str, permute: bool = False):
6162
"""
62-
Detailed explanation: https://engineering.salesforce.com/tls-fingerprinting-with-ja3-and-ja3s-247362855967/
63+
function sourced from: https://github.com/lexiforest/curl_cffi/blob/main/curl_cffi/requests/utils.py
64+
65+
Detailed explanation: https://engineering.salesforce.com/tls-fingerprinting-with-ja3-and-ja3s-247362855967/
6366
"""
6467

6568
def toggle_extensions_by_ids(curl: curl_cffi.Curl, extension_ids):
@@ -128,7 +131,9 @@ def toggle_extensions_by_ids(curl: curl_cffi.Curl, extension_ids):
128131

129132
def set_akamai_options(self, curl: curl_cffi.Curl, akamai: str):
130133
"""
131-
Detailed explanation: https://www.blackhat.com/docs/eu-17/materials/eu-17-Shuster-Passive-Fingerprinting-Of-HTTP2-Clients-wp.pdf
134+
function sourced from: https://github.com/lexiforest/curl_cffi/blob/main/curl_cffi/requests/utils.py
135+
136+
Detailed explanation: https://www.blackhat.com/docs/eu-17/materials/eu-17-Shuster-Passive-Fingerprinting-Of-HTTP2-Clients-wp.pdf
132137
"""
133138
settings, window_update, streams, header_order = akamai.split("|")
134139

curl_adapter/pycurl.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,8 @@ def set_curl_options(self, curl, request, url, timeout, proxies):
4242
super().set_curl_options(curl, request, url, timeout, proxies)
4343

4444
if self.use_curl_content_decoding:
45-
# For some reason pycurl content decoding can only be enabled like this:
46-
curl.setopt(pycurl.HTTP_CONTENT_DECODING, 0)
45+
# It's better to use the urllib3 content decoding instead of letting PyCurl, because it's limited.
46+
47+
# curl.setopt(pycurl.HTTP_CONTENT_DECODING, 0). There was a time when it needed to disable this in order for it's own encoding to work, weirdly. Though now it doesn't seem neccessary?
4748
curl.setopt(pycurl.ENCODING, "gzip, deflate") #br, zstd not supported...
48-
# Seems it better to use the urllib3 content decoding instead of automatic
49+

curl_adapter/stream/handler.py

Lines changed: 115 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,16 @@
88
from curl_cffi.curl import CurlOpt, CurlError
99

1010
class CurlStreamHandler():
11+
"""
12+
Curl Stream Handler
13+
14+
:copyright: (c) 2025 by Elis K.
15+
"""
16+
17+
1118
def __init__(self, curl_instance: typing.Union[curl_cffi.Curl, pycurl.Curl], executor: ThreadPoolExecutor=None, callback_after_perform=None):
1219
'''
13-
Initialize the stream handler.
20+
Initialize the stream handler.
1421
'''
1522
self.curl = curl_instance
1623
self.executor = executor or ThreadPoolExecutor()
@@ -23,21 +30,24 @@ def __init__(self, curl_instance: typing.Union[curl_cffi.Curl, pycurl.Curl], exe
2330
self.allow_cleanup = threading.Event()
2431
self.perform_finished = threading.Event()
2532
self.callback_after_perform = callback_after_perform
26-
33+
self._leftover = bytearray() # buffer for leftover data when chunk > requested
34+
2735
def _write_callback(self, chunk):
2836
'''
29-
Callback to handle incoming data chunks.
37+
Callback to handle incoming data chunks.
3038
'''
3139
if not self.initialized.is_set():
3240
self.initialized.set()
3341
if self.quit_event.is_set():
3442
return -1 # Signal to stop
3543

36-
self.chunk_queue.put_nowait(chunk) # Add chunk to the queue
44+
self.chunk_queue.put(chunk) # Add chunk to the queue
3745
return len(chunk)
3846

3947
def _download(self):
4048

49+
# Possible to set buffer size here
50+
# self.curl.setopt(CurlOpt.BUFFERSIZE, 8 * 1024)
4151
self.curl.setopt(CurlOpt.WRITEFUNCTION, self._write_callback)
4252

4353
try:
@@ -47,14 +57,15 @@ def _download(self):
4757
finally:
4858
self.chunk_queue.put(None) # End of stream
4959

60+
if self.callback_after_perform and callable(self.callback_after_perform):
61+
self.callback_after_perform()
62+
5063
self.perform_finished.set()
5164

5265
# Set to avoid blocking
5366
if not self.initialized.is_set():
5467
self.initialized.set()
5568

56-
if self.callback_after_perform and callable(self.callback_after_perform):
57-
self.callback_after_perform()
5869

5970
def start(self):
6071
self._future = self.executor.submit(self._download)
@@ -74,35 +85,110 @@ def set_headers_parsed(self):
7485
return self.allow_cleanup.set()
7586

7687
def read(self, amt=None):
77-
'''
78-
Read data from the queue in chunks. Returns a single chunk or all available data if amt is None.
79-
'''
88+
"""
89+
A more 'file-like' read from the queue:
90+
91+
- If `amt` is None, read all.
92+
- If `amt` is an integer, read exactly `amt` bytes.
93+
- Handles leftover data from previous chunk to avoid losing bytes.
94+
"""
95+
if self.closed:
96+
return b""
97+
98+
if self.error:
99+
raise self.error
100+
101+
# If amt is None, read everything:
80102
if amt is None:
81-
data = []
82-
while True:
83-
if self.error:
84-
raise self.error
85-
try:
86-
chunk = self.chunk_queue.get(timeout=1)
87-
if chunk is None: # End of stream
88-
break
89-
data.append(chunk)
90-
except queue.Empty:
91-
if self.quit_event.is_set():
92-
break
93-
return b"".join(data)
94-
else:
103+
return self._read_all()
104+
105+
# If amt is specified (and possibly 0 or > 0)
106+
return self._read_amt(amt)
107+
108+
def _read_all(self):
109+
"""
110+
Read *all* remaining data from leftover + queue
111+
"""
112+
out = bytearray()
113+
114+
# If there's leftover data, use it first
115+
out.extend(self._leftover)
116+
self._leftover.clear()
117+
118+
# Then read new chunks until we hit None or are closed
119+
while not self.closed:
120+
if self.error:
121+
raise self.error
122+
123+
try:
124+
chunk = self.chunk_queue.get(timeout=1)
125+
except queue.Empty:
126+
# No data currently available
127+
break
128+
129+
if chunk is None:
130+
# End of stream. Close here?
131+
if self.perform_finished.is_set():
132+
self.close()
133+
break
134+
135+
out.extend(chunk)
136+
137+
if self.quit_event.is_set():
138+
break
139+
140+
return bytes(out)
141+
142+
def _read_amt(self, amt):
143+
"""
144+
Read exactly `amt` bytes. Returns up to `amt`.
145+
"""
146+
out = bytearray()
147+
needed = amt
148+
149+
# First, consume leftover if available
150+
if self._leftover:
151+
take = min(needed, len(self._leftover))
152+
out.extend(self._leftover[:take])
153+
del self._leftover[:take]
154+
needed -= take
155+
156+
# Read additional chunks from the queue if we still need data
157+
while needed > 0 and not self.closed:
95158
if self.error:
96159
raise self.error
160+
97161
try:
98162
chunk = self.chunk_queue.get(timeout=1)
99-
if chunk is None: # End of stream
100-
return b""
101-
return chunk[:amt]
102163
except queue.Empty:
103-
return b""
164+
# Temporarily no data
165+
break
166+
167+
if chunk is None:
168+
# End of stream. close here?
169+
if self.perform_finished.is_set():
170+
self.close()
171+
172+
break
173+
174+
# If the chunk is bigger than needed, take part of it
175+
# and store the remainder in _leftover.
176+
if len(chunk) > needed:
177+
out.extend(chunk[:needed])
178+
self._leftover.extend(chunk[needed:])
179+
needed = 0
180+
else:
181+
# Chunk fits entirely
182+
out.extend(chunk)
183+
needed -= len(chunk)
184+
185+
if self.quit_event.is_set():
186+
break
187+
188+
return bytes(out)
104189

105190
def flush(self):
191+
#self._leftover.clear()
106192
pass
107193

108194
def close(self):
@@ -122,11 +208,10 @@ def close(self):
122208
# self.curl.close()
123209
self.allow_cleanup.wait(timeout=1)
124210
self.curl.reset()
125-
126-
211+
127212
def __del__(self):
128213
'''
129-
Destructor to ensure the response is properly closed when garbage-collected.
214+
Destructor to ensure the response is properly closed when garbage-collected.
130215
'''
131216
if not self.closed:
132217
self.close()

curl_adapter/stream/response.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717

1818
class BytesQueueBuffer:
1919
"""
20-
Needed to support newer versions of urllib3
20+
this class is sourced from urllib3 HTTPResponse. It's needed to support newer versions of urllib3
2121
------------------------------------------
2222
Memory-efficient bytes buffer
2323
@@ -130,7 +130,7 @@ def __init__(
130130
version=None, #HTTP Version header
131131

132132
preload_content=False,
133-
enforce_content_length=False,
133+
enforce_content_length=True,
134134
auto_close=True,
135135
):
136136

@@ -155,6 +155,11 @@ def __init__(
155155

156156
self.decode_content = self._handle_content_decoding
157157
self.enforce_content_length = enforce_content_length
158+
159+
if not self._handle_content_decoding:
160+
# In cases when curl is handling content decoding, disable content length checks otherwise we might get unexcepted errors
161+
self.enforce_content_length = False
162+
158163
self.auto_close = auto_close
159164

160165
self._decoder = None

0 commit comments

Comments
 (0)