FastAPI 异步 I/O 实践 #76

kyle-ip · 2024-03-03T07:27:53Z

kyle-ip
Mar 3, 2024
Maintainer

最近由于工作需要，开始了解一些 FastAPI 异步接口的实现，目的是通过异步 I/O 操作提高性能，即在等待外部操作（如数据库查询、网络请求等）完成的同时继续执行其他代码。

关键点：

非阻塞：await 关键字实现非阻塞等待，让出 CPU 控制权，允许事件循环处理其他任务。
任务切换：事件循环在多个异步任务之间高效地切换，确保 CPU 时间被充分利用。
并发执行（Concurrency）：即在单个线程中通过事件循环处理多个任务。任务是交替执行的，但 I/O 密集型应用可以显著提高性能。而并行执行（Parallelism）需要多线程或多进程，适用于 CPU 密集型任务。

可见在考虑异步化改造时，一个考量因素是任务是 I/O 密集还是 CPU 密集。

这里有一个异步批处理的例子：

async def async_process(
        func: Callable, 
        data: Iterable[Any], 
        max_retries: int = 3, 
        retry_on_exception: Optional[Callable[[Exception], bool]] = None) -> Iterable[Any]:
    chunk_size = 10
    data_chunks = [list(data)[i:i + chunk_size] for i in range(0, len(data), chunk_size)]
    tasks = [async_chunk(chunk, func, max_retries, retry_on_exception) for chunk in data_chunks]
    results = await asyncio.gather(*tasks)
    return [item for chunk in results for item in chunk]

async def async_test_function(x):
    await asyncio.sleep(0.1)
    return x * x

async def async_process():
    start_time = time.time()
    results = await asyncio.gather(*(async_test_function(x) for x in data))
    end_time = time.time()
    return end_time - start_time

print(
    asyncio.run(async_process())
)

现在回到 FastAPI 的异步接口设计，下面给出的几个简单例子，都会以 I/O 密集型任务为例展开。

并发任务

比如 FastAPI 后端同时向从两个外部 API 发起请求、获取数据，合并后返回给客户端。

from fastapi import FastAPI
import httpx
from typing import Union

app = FastAPI()

async def fetch_data(url: str) -> Union[dict, None]:
    async with httpx.AsyncClient() as client:
        try:
            response = await client.get(url)
            return response.json()
        except Exception as e:
            print(f"Error fetching {url}: {e}")
            return None

@app.get("/aggregate-data")
async def aggregate_data():
    url1 = "https://api.external1.com/data"
    url2 = "https://api.external2.com/data"
    response1, response2 = await asyncio.gather(fetch_data(url1), fetch_data(url2))
    aggregated_data = {"data_from_external1": response1, "data_from_external2": response2}
    return aggregated_data

fetch_data 使用 httpx.AsyncClient 异步地发送 GET 请求到指定的 URL，并返回响应的 JSON 数据。如果请求失败会返回None并打印错误消息。
asyncio.gather 在 aggregate_data 路径操作函数中，使用 asyncio.gather 来同时发起对两个外部 API 的请求。asyncio.gather 接收多个 awaitable 对象（async def 函数），并在其都完成时返回包含所有结果的元组。这允许应用非阻塞地等待多个 I/O 操作的完成，有效利用等待时间来处理其他任务。
在 fetch_data 函数中使用 try-except 来捕获可能的异常，并打印错误信息。复杂应用还需要考虑重试逻辑、超时处理等。

后台任务

在 FastAPI 中，定义后台任务执行不需要即时完成的操作，比如发送电子邮件、长时间的计算等。这些任务将在请求响应发送给客户端之后异步执行。

from fastapi import FastAPI, BackgroundTasks

app = FastAPI()

def write_log(message: str):
    # 一个写日志的操作
    with open("log.txt", mode="a") as log:
        log.write(message)

@app.get("/")
async def write_log_background(background_tasks: BackgroundTasks):
    background_tasks.add_task(write_log, message="Some background task log")
    return {"message": "Logging in the background"}

数据库访问

对于数据库操作，使用异步库（如databases、asyncpg等）可以提高性能。

以databases库为例，先安装databases和对应的数据库驱动，例如，对于 PostgreSQL：

pip install databases[postgresql]

然后：

from fastapi import FastAPI
from databases import Database

app = FastAPI()
database = Database("postgresql://user:password@localhost/dbname")

@app.on_event("startup")
async def startup():
    await database.connect()

@app.on_event("shutdown")
async def shutdown():
    await database.disconnect()

@app.get("/items/")
async def read_items():
    query = "SELECT * FROM items"
    return await database.fetch_all(query=query)

流式响应

即以流的方式逐渐发送响应体给客户端，对于处理大量数据非常有用，例如生成大型 CSV、JSON 文件。

from fastapi import FastAPI, Response
from typing import Generator

app = FastAPI()

def generate_large_json() -> Generator[str, None, None]:
    yield '['  # 开始一个JSON数组
    for i in range(1, 1000000):  # 假设有100万个元素
        # 生成JSON对象，最后一个元素后不加逗号
        yield '{"item": %d}' % i + (',' if i < 999999 else '')
    yield ']'  # 结束JSON数组

@app.get("/large-json")
async def large_json():
    return Response(content=generate_large_json(), media_type="application/json")

generate_large_json 是生成器，先发送开启数组的 [，然后逐步生成 JSON 对象、以逗号分隔，最后发送关闭数组的 ]。数据是分批发送的，但客户端可接收到的完整有效的 JSON 数组。

当使用流式响应发送大量数据时，客户端需要处理分批到达的数据。

fetch("/large-json")
  .then(response => {
    const reader = response.body.getReader();
    let decoder = new TextDecoder();
    
    return reader.read().then(function processText({ done, value }) {
      if (done) {
        console.log("Stream complete");
        return;
      }
      
      // 将 Uint8Array 转换为字符串，然后可以解析 JSON。
      let str = decoder.decode(value, {stream: true});
      console.log(str);
      
      // 递归调用以读取下一个数据块
      return reader.read().then(processText);
    });
  })
  .catch(console.error);

如果希望边读取边解析 JSON 并动态更新UI，需要在处理字符串时做额外的工作，比如拼接字符串直到有完整的 JSON 对象，解析并更新 UI。

流式响应适用于数据量大到足以影响服务器性能或稳定性的场景。此外还应考虑其他优化措施，如使用压缩、缓存策略或者调整 API 设计来减少单次请求的数据量。

文件操作

对于文件读写操作，可使用 aiofiles 进行异步操作以避免阻塞。先安装 aiofiles：

pip install aiofiles

然后：

import aiofiles
from fastapi import FastAPI

app = FastAPI()

@app.get("/read-file/")
async def read_file_async():
    async with aiofiles.open("large_file.txt", mode="r") as f:
        contents = await f.read()
    return {"contents": contents}

使用异步缓存

对于频繁访问的资源，使用异步缓存可以减少数据库的查询次数，提高响应速度。可使用 aiocache 库来实现。

pip install aiocache

示例：

from aiocache import Cache
from aiocache.serializers import JsonSerializer
from fastapi import FastAPI

app = FastAPI()
cache = Cache(Cache.REDIS, endpoint="localhost", port=6379, serializer=JsonSerializer())

@app.get("/item/{item_id}")
async def read_item(item_id: int):
    cache_key = f"item_{item_id}"
    item = await cache.get(cache_key)
    if not item:
        # 假设从数据库获取 item
        item = {"id": item_id, "name": "Item Name"}
        await cache.set(cache_key, item)
    return item

CPU 密集怎么办呢？

众所周知 Python 解释器存在 GIL，任何时刻只有一个线程可以在解释器中执行 Python 字节码，这限制了多线程在执行 CPU 密集型任务时的并行性。而多进程开销又很大，适用场景很有限，还有什么办法呢？

其实有时对于非 I/O 操作，多线程带来性能提升也是有可能的：

即使是纯内存数据处理，某些操作也可能会释放 GIL。例如执行密集计算的库函数（如 NumPy 操作）可在执行期间释放 GIL。Python 代码不能并行执行，底层的 C/C++ 库也可以利用多核心并行执行。
多线程下有时虽然没有明显的 I/O 操作，当一个线程因为某些原因不能执行，可能存在微小的等待时间（如等待数据从内存加载到 CPU 缓存），操作系统可执行上下文切换、将 CPU 时间分配给其他线程来执行，宏观上提高运行效率。
现代 CPU 的多级缓存使不同线程可在不同核心上执行，利用各自核心的缓存。处理大量数据时，合理分配数据到不同线程可减少缓存失效，从而提高处理速度。
不同的 Python 解释器（如 CPython 或 PyPy）和数据处理库可能在内部以不同方式处理多线程。一些库可能有针对多线程优化的机制，在 GIL 限制下也能有效利用多核处理器的资源。

在设计理念、工作原理和适用场景上有显著差异，多线程/多进程也是另一种异步实现方式（Future 模式），这里也给出一个例子：

def process_chunk(
        chunk: List[Any], 
        func: Callable[[Any], Any], 
        chunk_index: int, 
        chunk_size: int, 
        max_retries: int, 
        retry_on_exception: Optional[Callable[[Exception], bool]]) -> Tuple[int, List[Any]]:
    chunk_result = [None] * chunk_size
    for idx, item in enumerate(chunk):
        retries = max_retries
        while retries >= 0:
            try:
                result = func(item)
                chunk_result[idx] = result
                break
            except Exception as e:
                if retry_on_exception and retry_on_exception(e) and retries > 0:
                    retries -= 1
                    logger.error(f"Retrying task due to: {e}, remaining retries: {retries}")
                else:
                    logger.error(f"Task execution error: {e}")
                    chunk_result[idx] = None
                    break
    return (chunk_index, chunk_result)

def parallel_process(
        func: Callable[[Any], Any], 
        data_list: List[Any],
        executor: Executor = None, 
        max_retries: int = 3, 
        retry_on_exception: Optional[Callable[[Exception], bool]] = None) -> List[Any]:

    if not executor:
        with ThreadPoolExecutor() as executor:
            return parallel_process(func, data, executor, max_retries, retry_on_exception)

    num_threads = executor._max_workers
    chunk_size = math.ceil(len(data_list) / num_threads)
    data_chunks = [data_list[i:i + chunk_size] for i in range(0, len(data_list), chunk_size)]

    future_to_chunk = {executor.submit(process_chunk, chunk, func, i, len(chunk), max_retries, retry_on_exception): i for i, chunk in enumerate(data_chunks)}
    ordered_results = [None] * len(data_list)

    for future in as_completed(future_to_chunk):
        chunk_index, chunk_result = future.result()
        start_index = chunk_index * chunk_size
        ordered_results[start_index:start_index + len(chunk_result)] = chunk_result

    return ordered_results

总结

	多进程/多线程	异步 I/O
性能	多进程有效利用多核 CPU 并行计算能力，适合 CPU 密集型任务。多线程适合用于处理 I/O 密集型任务，但受 GIL 影响，在 CPU 密集型任务上效果不佳。	适合 I/O 密集型任务，通过事件循环和非阻塞 I/O 提高并发性能。
资源	多进程有多个独立内存空间，资源消耗相对较大。多线程共享内存，资源消耗较少，但需要注意线程安全问题。	在单线程内执行协程，资源利用率高，上下文切换开销小。
复杂度	需要管理线程/进程间的同步，如锁、信号量等，编程模型相对复杂。	通过async-await简化了异步操作，但在深层异步调用和异常处理上可能相对困难。
适用场景	多进程适合计算密集型任务，多线程适合 I/O 密集型任务。	适合 I/O 密集型任务，尤其适合需要大量等待 I/O 操作的场景。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FastAPI 异步 I/O 实践 #76

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

FastAPI 异步 I/O 实践 #76

Uh oh!

Uh oh!

kyle-ip Mar 3, 2024 Maintainer

并发任务

后台任务

数据库访问

流式响应

文件操作

使用异步缓存

CPU 密集怎么办呢？

总结

Replies: 0 comments

kyle-ip
Mar 3, 2024
Maintainer