chunked data reading #543

jsbucy · 2025-02-28T19:08:06Z

What do these changes do?

Add new handler hook DATA_CHUNK which is invoked from the data reading loop. This allows streaming the data to a file or other storage api without having to buffer the whole message in memory first.

Are there changes in behavior for the user?

The new hook is opt-in, there should be no behavior changes for existing users.

Related issue number

This may also improve/fix #293 which I suspect is due to the tight loop decoding dotstuff for the whole message at once hogging the GIL in the old implementation.

Checklist

smtp_DATA() could buffer an unbounded amount of data in line_fragments until it got crlf and we're going to throw it away anyway. drop TestSMTPWithController.test_long_line_leak which is now moot

…g loop. DATA_CHUNK takes 3 parameters: data : bytes, decoded_data : Optional[str], last : bool and returns Optional[bytes] response If the hook returns a response prior to the last=True chunk, smtp_DATA will read/discard the remaining data from the client without invoking the hook again. This allows streaming the data to a file or other storage api without having to buffer the whole message in memory first. Move dotstuff and utf8 decode into data reading loop to support this. This may also improve/fix aio-libs#293 which I suspect is due to the tight loop decoding dotstuff for the whole message at once hogging the GIL in the old implementation.

otherwise do it at the end as before so we don't ~double the memory while we're reading from the client

1: buffer through BytesIO, not List[bytes], based on a simple microbenchmark, this is about 40% faster 2: when using the DATA_CHUNK hook, buffer SMTP(chunk_size=2**16) before calling the hook, the way this was before was calling the hook for every line

…ata_chunk3

aiosmtpd/testing/helpers.py

aiosmtpd/smtp.py

aiosmtpd/testing/helpers.py

aiosmtpd/tests/test_smtp.py

Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <[email protected]>

…ata_chunk3

codecov · 2025-06-18T20:27:30Z

Codecov Report

❌ Patch coverage is 99.52381% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 97.87%. Comparing base (98f5783) to head (547bf02).

Files with missing lines	Patch %	Lines
aiosmtpd/smtp.py	98.64%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #543      +/-   ##
==========================================
+ Coverage   97.82%   97.87%   +0.04%     
==========================================
  Files          23       23              
  Lines        5701     5869     +168     
  Branches      766      796      +30     
==========================================
+ Hits         5577     5744     +167     
- Misses         78       79       +1     
  Partials       46       46

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This brings up a bit of a discussion point: the way I have it now, the new DATA_CHUNK hook returns Optional[bytes]. The idea is that it will typically return None if last=False unless there was an early-return error but the final call with last=True is always expected to return a value. This can't exactly be expressed with annotations so I needed some asserts to pass mypy.

…st chunk

aiosmtpd/smtp.py

aiosmtpd/testing/helpers.py

Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <[email protected]>

this is probably not common but gives the handler implementation another opportunity to preempt the data transfer in case of an error

…t response these changes increased the scope of the status variable in SMTP.smtp_DATA() that entailed an extra test of status is None in the backward-compatible path and it was easier to add the coverage than untangle that

I have encountered situations in the wild where someone was trying to send a large message over a slow connection that would require a very large static timeout (>hour) to accomodate. This way you can keep the timeout reasonably short (minutes) and it can take as long as it takes as long as the client keeps making forward progress. We could make this available to the current api under control of a flag but I don't want to unexpectedly change the behavior for exiseting users.

…ata_chunk3

jsbucy added 7 commits February 27, 2025 14:50

remove SMTP.smtp_DATA() line_fragments

85a2bab

smtp_DATA() could buffer an unbounded amount of data in line_fragments until it got crlf and we're going to throw it away anyway. drop TestSMTPWithController.test_long_line_leak which is now moot

only decode data in read loop if we're using new DATA_CHUNK hook

36d035c

otherwise do it at the end as before so we don't ~double the memory while we're reading from the client

Merge branch 'master' into data_chunk3

e2c4f72

Merge branch 'master' into data_chunk3

a9162af

Merge branch 'data_chunk3' of ssh://github.com/jsbucy/aiosmtpd into d…

c99b486

…ata_chunk3

github-advanced-security bot found potential problems Jun 18, 2025

View reviewed changes

aiosmtpd/testing/helpers.py Fixed Show fixed Hide fixed

webknjaz reviewed Jun 18, 2025

View reviewed changes

aiosmtpd/smtp.py Outdated Show resolved Hide resolved

aiosmtpd/smtp.py Outdated Show resolved Hide resolved

aiosmtpd/testing/helpers.py Outdated Show resolved Hide resolved

aiosmtpd/tests/test_smtp.py Show resolved Hide resolved

jsbucy and others added 7 commits June 18, 2025 11:16

fix type annotation

ab8440b

mypy passing

1adebc6

Merge branch 'master' into data_chunk3

f0398d7

fix warning: Explicit returns mixed with implicit (fall through) returns

ea70fb1

Apply suggestions from code review

b71377d

Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <[email protected]>

Merge branch 'data_chunk3' of ssh://github.com/jsbucy/aiosmtpd into d…

58dbe07

…ata_chunk3

fix whitespace, typing

612d269

jsbucy added 5 commits June 18, 2025 13:58

use a smaller buffer in test to exercise the flush path

ce0a1a7

coverage: add test where DATA_CHUNK hook returns response prior to la…

896962f

…st chunk

coverage: add test for decode error w/chunked receiving

fe02f98

coverage: add test for chunked receiving without decode_data

5ad19e6

webknjaz reviewed Jun 19, 2025

View reviewed changes

aiosmtpd/smtp.py Outdated Show resolved Hide resolved

aiosmtpd/testing/helpers.py Outdated Show resolved Hide resolved

jsbucy and others added 7 commits June 19, 2025 14:20

Apply suggestions from code review

b1da843

Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <[email protected]>

data_chunk hook: invoke hook for DATA response

83ba510

this is probably not common but gives the handler implementation another opportunity to preempt the data transfer in case of an error

fix mypy

5bb73cf

Merge branch 'data_chunk3' of ssh://github.com/jsbucy/aiosmtpd into d…

f748fbf

…ata_chunk3

fix annotation in ChunkedReceivingHandler

708f081

jsbucy added 4 commits June 23, 2025 16:30

add missing import

08ab7f2

fix lint

bac5e11

update docs/NEWS

1307ede

fix NEWS

547bf02

jsbucy mentioned this pull request Jun 25, 2025

Support chunk sending in smtplib python/cpython#135952

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chunked data reading #543

chunked data reading #543

Uh oh!

jsbucy commented Feb 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jun 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chunked data reading #543

Are you sure you want to change the base?

chunked data reading #543

Uh oh!

Conversation

jsbucy commented Feb 28, 2025

What do these changes do?

Are there changes in behavior for the user?

Related issue number

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jun 18, 2025 •

edited

Loading