Skip to content

Commit bb5fc59

Browse files
authored
Add warning about consuming the payload in middleware (#10970)
1 parent 48f5324 commit bb5fc59

File tree

2 files changed

+96
-5
lines changed

2 files changed

+96
-5
lines changed

CHANGES/2914.doc.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Improved documentation for middleware by adding warnings and examples about
2+
request body stream consumption. The documentation now clearly explains that
3+
request body streams can only be read once and provides best practices for
4+
sharing parsed request data between middleware and handlers -- by :user:`bdraco`.

docs/web_advanced.rst

Lines changed: 92 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -568,9 +568,13 @@ A *middleware* is a coroutine that can modify either the request or
568568
response. For example, here's a simple *middleware* which appends
569569
``' wink'`` to the response::
570570

571-
from aiohttp.web import middleware
571+
from aiohttp import web
572+
from typing import Callable, Awaitable
572573

573-
async def middleware(request, handler):
574+
async def middleware(
575+
request: web.Request,
576+
handler: Callable[[web.Request], Awaitable[web.StreamResponse]]
577+
) -> web.StreamResponse:
574578
resp = await handler(request)
575579
resp.text = resp.text + ' wink'
576580
return resp
@@ -619,18 +623,25 @@ post-processing like handling *CORS* and so on.
619623
The following code demonstrates middlewares execution order::
620624

621625
from aiohttp import web
626+
from typing import Callable, Awaitable
622627

623-
async def test(request):
628+
async def test(request: web.Request) -> web.Response:
624629
print('Handler function called')
625630
return web.Response(text="Hello")
626631

627-
async def middleware1(request, handler):
632+
async def middleware1(
633+
request: web.Request,
634+
handler: Callable[[web.Request], Awaitable[web.StreamResponse]]
635+
) -> web.StreamResponse:
628636
print('Middleware 1 called')
629637
response = await handler(request)
630638
print('Middleware 1 finished')
631639
return response
632640

633-
async def middleware2(request, handler):
641+
async def middleware2(
642+
request: web.Request,
643+
handler: Callable[[web.Request], Awaitable[web.StreamResponse]]
644+
) -> web.StreamResponse:
634645
print('Middleware 2 called')
635646
response = await handler(request)
636647
print('Middleware 2 finished')
@@ -649,6 +660,82 @@ Produced output::
649660
Middleware 2 finished
650661
Middleware 1 finished
651662

663+
Request Body Stream Consumption
664+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
665+
666+
.. warning::
667+
668+
When middleware reads the request body (using :meth:`~aiohttp.web.BaseRequest.read`,
669+
:meth:`~aiohttp.web.BaseRequest.text`, :meth:`~aiohttp.web.BaseRequest.json`, or
670+
:meth:`~aiohttp.web.BaseRequest.post`), the body stream is consumed. However, these
671+
high-level methods cache their result, so subsequent calls from the handler or other
672+
middleware will return the same cached value.
673+
674+
The important distinction is:
675+
676+
- High-level methods (:meth:`~aiohttp.web.BaseRequest.read`, :meth:`~aiohttp.web.BaseRequest.text`,
677+
:meth:`~aiohttp.web.BaseRequest.json`, :meth:`~aiohttp.web.BaseRequest.post`) cache their
678+
results internally, so they can be called multiple times and will return the same value.
679+
- Direct stream access via :attr:`~aiohttp.web.BaseRequest.content` does NOT have this
680+
caching behavior. Once you read from ``request.content`` directly (e.g., using
681+
``await request.content.read()``), subsequent reads will return empty bytes.
682+
683+
Consider this middleware that logs request bodies::
684+
685+
from aiohttp import web
686+
from typing import Callable, Awaitable
687+
688+
async def logging_middleware(
689+
request: web.Request,
690+
handler: Callable[[web.Request], Awaitable[web.StreamResponse]]
691+
) -> web.StreamResponse:
692+
# This consumes the request body stream
693+
body = await request.text()
694+
print(f"Request body: {body}")
695+
return await handler(request)
696+
697+
async def handler(request: web.Request) -> web.Response:
698+
# This will return the same value that was read in the middleware
699+
# (i.e., the cached result, not an empty string)
700+
body = await request.text()
701+
return web.Response(text=f"Received: {body}")
702+
703+
In contrast, when accessing the stream directly (not recommended in middleware)::
704+
705+
async def stream_middleware(
706+
request: web.Request,
707+
handler: Callable[[web.Request], Awaitable[web.StreamResponse]]
708+
) -> web.StreamResponse:
709+
# Reading directly from the stream - this consumes it!
710+
data = await request.content.read()
711+
print(f"Stream data: {data}")
712+
return await handler(request)
713+
714+
async def handler(request: web.Request) -> web.Response:
715+
# This will return empty bytes because the stream was already consumed
716+
data = await request.content.read()
717+
# data will be b'' (empty bytes)
718+
719+
# However, high-level methods would still work if called for the first time:
720+
# body = await request.text() # This would read from internal cache if available
721+
return web.Response(text=f"Received: {data}")
722+
723+
When working with raw stream data that needs to be shared between middleware and handlers::
724+
725+
async def stream_parsing_middleware(
726+
request: web.Request,
727+
handler: Callable[[web.Request], Awaitable[web.StreamResponse]]
728+
) -> web.StreamResponse:
729+
# Read stream once and store the data
730+
raw_data = await request.content.read()
731+
request['raw_body'] = raw_data
732+
return await handler(request)
733+
734+
async def handler(request: web.Request) -> web.Response:
735+
# Access the stored data instead of reading the stream again
736+
raw_data = request.get('raw_body', b'')
737+
return web.Response(body=raw_data)
738+
652739
Example
653740
^^^^^^^
654741

0 commit comments

Comments
 (0)