Skip to content

Commit 1a36022

Browse files
authored
Python agent performance enhancement with asyncio (#316)
New experimental feature
1 parent 8e4fa83 commit 1a36022

File tree

30 files changed

+2537
-1094
lines changed

30 files changed

+2537
-1094
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
## Change Logs
22

33
### 1.1.0
4+
5+
- Feature:
6+
- Users now can specify the `SW_AGENT_ASYNCIO_ENHANCEMENT` environment variable to enable the performance enhancement with asyncio (#316)
7+
48
- Plugins:
59
- Add neo4j plugin.(#312)
610

docs/en/setup/Configuration.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ export SW_AGENT_YourConfiguration=YourValue
4040
| agent_instance_properties_json | SW_AGENT_INSTANCE_PROPERTIES_JSON | <class 'str'> | | A custom JSON string to be reported as service instance properties, e.g. `{"key": "value"}` |
4141
| agent_experimental_fork_support | SW_AGENT_EXPERIMENTAL_FORK_SUPPORT | <class 'bool'> | False | The agent will restart itself in any os.fork()-ed child process. Important Note: it's not suitable for short-lived processes as each one will create a new instance in SkyWalking dashboard in format of `service_instance-child(pid)`. This feature may not work when a precise combination of gRPC + Python 3.7 + subprocess (not fork) is used together. The agent will output a warning log when using on Python 3.7 for such a reason. |
4242
| agent_queue_timeout | SW_AGENT_QUEUE_TIMEOUT | <class 'int'> | 1 | DANGEROUS - This option controls the interval of each bulk report from telemetry data queues Do not modify unless you have evaluated its impact given your service load. |
43+
| agent_asyncio_enhancement | SW_AGENT_ASYNCIO_ENHANCEMENT | <class 'bool'> | False | Replace the threads to asyncio coroutines to report telemetry data to the OAP. This option is experimental and may not work as expected. |
4344
### SW_PYTHON Auto Instrumentation CLI
4445
| Configuration | Environment Variable | Type | Default Value | Description |
4546
| :------------ | :------------ | :------------ | :------------ | :------------ |
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Python Agent Asynchronous Enhancement
2+
3+
Since `1.1.0`, the Python agent supports asynchronous reporting of ALL telemetry data, including traces, metrics, logs and profile. This feature is disabled by default, since it is still in the experimental stage. You can enable it by setting the `SW_AGENT_ASYNCIO_ENHANCEMENT` environment variable to `true`. See [the configuration document](../Configuration.md) for more information.
4+
5+
```bash
6+
export SW_AGENT_ASYNCIO_ENHANCEMENT=true
7+
```
8+
9+
## Why we need this feature
10+
11+
Before version `1.1.0`, SkyWalking Python agent had only an implementation with the Threading module to provide data reporters. Yet with the growth of the Python agent, it is now fully capable and requires more resources than when only tracing was supported (we start many threads and gRPC itself creates even more threads when streaming).
12+
13+
As well known, the Global Interpreter Lock (GIL) in Python can limit the true parallel execution of threads. This issue also effects the Python agent, especially on network communication with the SkyWalking OAP (gRPC, HTTP and Kafka).
14+
15+
Therefore, we have decided to implement the reporter code for the SkyWalking Python agent based on the `asyncio` library. `asyncio` is an officially supported asynchronous programming library in Python that operates on a single-threaded, coroutine-driven model. Currently, it enjoys widespread adoption and boasts a rich ecosystem, making it the preferred choice for enhancing asynchronous capabilities in many Python projects.
16+
17+
## How it works
18+
19+
To keep the API unchanged, we have completely rewritten a new class called `SkyWalkingAgentAsync` (identical to the `SkyWalkingAgent` class). We use the environment variable mentioned above, `SW_AGENT_ASYNCIO_ENHANCEMENT`, to control which class implements the agent's interface.
20+
21+
In the `SkyWalkingAgentAsync` class, we have employed asyncio coroutines and their related functions to replace the Python threading implementation in nearly all instances. And we have applied asyncio enhancements to all three primary reporting protocols of the current SkyWalking Python agent:
22+
23+
- **gRPC**: We use the [`grpc.aio`](https://grpc.github.io/grpc/python/grpc_asyncio.html) module to replace the `grpc` module. Since the `grpc.aio` module is also officially supported and included in the `grpc` package, we can use it directly without any additional installation.
24+
25+
- **HTTP**: We use the [`aiohttp`](https://github.com/aio-libs/aiohttp) module to replace the `requests` module.
26+
27+
- **Kafka**: We use the [`aiokafka`](https://github.com/aio-libs/aiokafka) module to replace the `kafka-python` module.
28+
29+
## Performance improvement
30+
31+
We use [wrk](https://github.com/wg/wrk) to pressure test the network throughput of the Python agents in a [FastAPI](https://github.com/tiangolo/fastapi) application.
32+
33+
- gRPC
34+
35+
The performance has been improved by about **32.8%**
36+
37+
| gRPC | QPS | TPS | Avg Latency |
38+
| :-------------: | :-----: | :------: | :---------: |
39+
| sync (original) | 899.26 | 146.66KB | 545.97ms |
40+
| async (new) | 1194.55 | 194.81KB | 410.97ms |
41+
42+
- HTTP
43+
44+
The performance has been improved by about **9.8%**
45+
46+
| HTTP | QPS | TPS | Avg Latency |
47+
| :-------------: | :----: | :-----: | :---------: |
48+
| sync (original) | 530.95 | 86.59KB | 1.53s |
49+
| async (new) | 583.37 | 95.14KB | 1.44s |
50+
51+
- Kafka
52+
53+
The performance has been improved by about **89.6%**
54+
55+
| Kafka | QPS | TPS | Avg Latency |
56+
| :-------------: | :----: | :------: | :---------: |
57+
| sync (original) | 345.89 | 56.41KB | 1.09s |
58+
| async (new) | 655.67 | 106.93KB | 1.24s |
59+
60+
> In fact, only the performance improvement of gRPC is of more reference value. Because the other two protocols use third-party libraries with completely different implementations, the performance improvement depends to a certain extent on the performance of these third-party libraries.
61+
62+
More details see this [PR](https://github.com/apache/skywalking-python/pull/316) .
63+
64+
## Potential problems
65+
66+
We have shown that the asynchronous enhancement function improves the transmission efficiency of metrics, traces and logs. But **it improves the proformance of profile data very little, and even causes performance degradation**.
67+
68+
This is mainly because a large part of the data in the `profile` part comes from the monitoring and measurement of Python threads, which is exactly what we need to avoid in asynchronous enhancement. Since operations on threads cannot be bypassed, we may need additional overhead to support cross-thread coroutine communication, which may lead to performance degradation instead of increase.
69+
70+
Asynchronous enhancements involve many code changes and introduced some new dependencies. Since this feature is relatively new, it may cause some unexpected errors and problems. **If you encounter them, please feel free to contact us or submit issues and PRs**!

docs/menu.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@ catalog:
3838
path: "/en/setup/advanced/MeterReporter"
3939
- name: "Manual Trace Instrumentation"
4040
path: "/en/setup/advanced/API"
41+
- name: "Asynchronous Enhancement"
42+
path: "/en/setup/advanced/AsyncEnhancement"
4143
- name: "Supported Plugins"
4244
catalog:
4345
- name: "Supported Libraries"

0 commit comments

Comments
 (0)