Skip to content

Conversation

@jsbucy
Copy link
Contributor

@jsbucy jsbucy commented Feb 28, 2025

What do these changes do?

Add new handler hook DATA_CHUNK which is invoked from the data reading loop. This allows streaming the data to a file or other storage api without having to buffer the whole message in memory first.

Are there changes in behavior for the user?

The new hook is opt-in, there should be no behavior changes for existing users.

Related issue number

This may also improve/fix #293 which I suspect is due to the tight loop decoding dotstuff for the whole message at once hogging the GIL in the old implementation.

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • tox testenvs have been executed in the following environments:
    • Linux (Ubuntu 18.04, Ubuntu 20.04, Arch): {py36,py37,py38,py39}-{nocov,cov,diffcov}, qa, docs
    • Windows (7, 10): {py36,py37,py38,py39}-{nocov,cov,diffcov}
    • WSL 1.0 (Ubuntu 18.04): {py36,py37,py38,py39}-{nocov,cov,diffcov}, pypy3-{nocov,cov}, qa, docs
    • FreeBSD (12.2, 12.1, 11.4): {py36,pypy3}-{nocov,cov,diffcov}, qa
    • Cygwin: py36-{nocov,cov,diffcov}, qa, docs
  • Documentation reflects the changes
  • Add a news fragment into the NEWS.rst file
    • Add under the "aiosmtpd-next" section, creating one if necessary
      • You may create subsections to group the changes, if you like
    • Use full sentences with correct case and punctuation
    • Refer to relevant Issue if applicable

smtp_DATA() could buffer an unbounded amount of data in line_fragments until
it got crlf and we're going to throw it away anyway.

drop TestSMTPWithController.test_long_line_leak which is now moot
…g loop.

DATA_CHUNK takes 3 parameters:
data : bytes, decoded_data : Optional[str], last : bool
and returns Optional[bytes] response

If the hook returns a response prior to the last=True chunk, smtp_DATA
will read/discard the remaining data from the client without invoking
the hook again.

This allows streaming the data to a file or other storage api without
having to buffer the whole message in memory first.

Move dotstuff and utf8 decode into data reading loop to support this.

This may also improve/fix aio-libs#293 which I suspect is due to the tight
loop decoding dotstuff for the whole message at once hogging the GIL
in the old implementation.
otherwise do it at the end as before so we don't ~double the memory
while we're reading from the client
1: buffer through BytesIO, not List[bytes], based on a simple
microbenchmark, this is about 40% faster
2: when using the DATA_CHUNK hook, buffer SMTP(chunk_size=2**16) before
calling the hook, the way this was before was calling the hook for every line
@codecov
Copy link

codecov bot commented Jun 18, 2025

Codecov Report

❌ Patch coverage is 99.52381% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 97.87%. Comparing base (98f5783) to head (547bf02).

Files with missing lines Patch % Lines
aiosmtpd/smtp.py 98.64% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #543      +/-   ##
==========================================
+ Coverage   97.82%   97.87%   +0.04%     
==========================================
  Files          23       23              
  Lines        5701     5869     +168     
  Branches      766      796      +30     
==========================================
+ Hits         5577     5744     +167     
- Misses         78       79       +1     
  Partials       46       46              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jsbucy added 5 commits June 18, 2025 13:58
This brings up a bit of a discussion point: the way I have it now, the
new DATA_CHUNK hook returns Optional[bytes]. The idea is that it will
typically return None if last=False unless there was an early-return
error but the final call with last=True is always expected to return a
value. This can't exactly be expressed with annotations so I needed
some asserts to pass mypy.
jsbucy and others added 7 commits June 19, 2025 14:20
Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <[email protected]>
this is probably not common but gives the handler implementation
another opportunity to preempt the data transfer in case of an error
…t response

these changes increased the scope of the status variable in
SMTP.smtp_DATA() that entailed an extra test of status is None in the
backward-compatible path and it was easier to add the coverage than
untangle that
I have encountered situations in the wild where someone was trying to
send a large message over a slow connection that would require a very
large static timeout (>hour) to accomodate. This way you can keep the
timeout reasonably short (minutes) and it can take as long as it takes
as long as the client keeps making forward progress.

We could make this available to the current api under control of a
flag but I don't want to unexpectedly change the behavior for
exiseting users.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Entire python process stalls when receiving multiple large inbound emails

2 participants