Skip to content

Commit 34462ac

Browse files
github-actions[bot]pablogsalemmettbutler
authored
fix: reference leaks and incorrect C-API usage in _stacktrace.c [backport 2.0] (#7148)
Backport ac10377 from #7112 to 2.0. I was just reading this file and found many problems and leaks so I thought that a PR may be welcomed 😉 There are several problems and possible improvements with "_stacktrace.c": * The macros for 3.11+ have some nested calls such as: #define GET_FILENAME(frame) PyObject_GetAttrString(PyFrame_GetCode(frame), "co_filename") The problem with these nested calls is that the nested functions return strong references that are never cleaned and also the call can fail, which will crash the interpreter or raise SystemError and doesn't allow the code to properly do error management. * The macros for 3.11+ that return PyObject* are strong references that must be properly managed or otherwise the code will leak. * There are many calls that are not checked for errors such as PyBytes_AsString, PyFrame_GetBack, PyThreadState_GetFrame, ... * When PyThreadState_Get or PyThreadState_GetFrame return NULL, that's considered a hard error (for instance this can happen at interpreter teardown) and returning None is incorrect. The error must be propagated. * There invalid paths are innecesarily obscured and the only situation where None should be returned (although I would advise to return an error instead) is if the argument passed is not a unicode/string object. * The function just receives one argument, so it should use METH_O instead of adding the extra complication of using METH_VARARGS. The leaks can be confirmed using `memray` or a debug version of python. Just use this example `setup.py` to compile the extension: ``` from setuptools import setup, Extension extension_module = Extension( '_stacktrace', sources=['_stacktrace.c'], ) setup( name='_stacktrace', version='1.0', ext_modules=[extension_module], ) ``` To check for leaks one possibility is using a debug version of the interpreter and print the total reference count bere and after: ``` import _stacktrace as s import sys, gc for _ in range(100): gc.collect() a = sys.gettotalrefcount() s.get_info_frame(b"") gc.collect() print(sys.gettotalrefcount() -a) ``` This prints: ``` 7 7 7 7 ``` because there are 7 references being leaked (at least). The other is using `memray` or another memory profieler. Use this drive script: ``` # Test script import _stacktrace as s import gc for _ in range(1_000_000): s.get_info_frame(b"") gc.collect() ``` Then use memray: ``` $ memray run --trace-python-allocators --native -o lel.bin -f test.py $ python -m memray flamegraph lel.bin --leaks -f ``` ![leaks](https://github.com/DataDog/dd-trace-py/assets/11718525/18965b56-5159-4723-aa81-6a5f7242a1ef) Note that frame objects do not appear in `memray` because we only leak ~7 frames and they use a freelist: https://github.com/python/cpython/blob/97ce15c5f8743fb8b8967c6a05d6aec9fef9cbc7/Objects/frameobject.c#L810 ## Checklist - [x] Change(s) are motivated and described in the PR description. - [x] Testing strategy is described if automated tests are not included in the PR. - [x] Risk is outlined (performance impact, potential for breakage, maintainability, etc). - [x] Change is maintainable (easy to change, telemetry, documentation). - [x] [Library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) are followed. If no release note is required, add label `changelog/no-changelog`. - [x] Documentation is included (in-code, generated user docs, [public corp docs](https://github.com/DataDog/documentation/)). - [x] Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Title is accurate. - [x] No unnecessary changes are introduced. - [x] Description motivates each change. - [x] Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes unless absolutely necessary. - [x] Testing strategy adequately addresses listed risk(s). - [x] Change is maintainable (easy to change, telemetry, documentation). - [x] Release note makes sense to a user of the library. - [x] Reviewer has explicitly acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment. - [x] Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) - [x] If this PR touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from `@DataDog/security-design-and-guidance`. - [x] This PR doesn't touch any of that. Co-authored-by: Pablo Galindo Salgado <[email protected]> Co-authored-by: Emmett Butler <[email protected]>
1 parent 7c1b921 commit 34462ac

File tree

2 files changed

+72
-38
lines changed

2 files changed

+72
-38
lines changed

ddtrace/appsec/_iast/_stacktrace.c

Lines changed: 68 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,27 @@
1717
#define GET_LINENO(frame) PyFrame_GetLineNumber((PyFrameObject*)frame)
1818
#define GET_FRAME(tstate) PyThreadState_GetFrame(tstate)
1919
#define GET_PREVIOUS(frame) PyFrame_GetBack(frame)
20-
#define GET_FILENAME(frame) PyObject_GetAttrString(PyFrame_GetCode(frame), "co_filename")
20+
#define FRAME_DECREF(frame) Py_DECREF(frame)
21+
#define FRAME_XDECREF(frame) Py_XDECREF(frame)
22+
#define FILENAME_DECREF(filename) Py_DECREF(filename)
23+
#define FILENAME_XDECREF(filename) Py_XDECREF(filename)
24+
static inline PyObject*
25+
GET_FILENAME(PyFrameObject* frame)
26+
{
27+
PyCodeObject* code = PyFrame_GetCode(frame);
28+
if (!code) {
29+
return NULL;
30+
}
31+
return PyObject_GetAttrString((PyObject*)code, "co_filename");
32+
}
2133
#else
2234
#define GET_FRAME(tstate) tstate->frame
2335
#define GET_PREVIOUS(frame) frame->f_back
2436
#define GET_FILENAME(frame) frame->f_code->co_filename
37+
#define FRAME_DECREF(frame)
38+
#define FRAME_XDECREF(frame)
39+
#define FILENAME_DECREF(filename)
40+
#define FILENAME_XDECREF(filename)
2541
#if PY_MAJOR_VERSION >= 3 && PY_MINOR_VERSION >= 10
2642
/* See: https://bugs.python.org/issue44964 */
2743
#define GET_LINENO(frame) PyCode_Addr2Line(frame->f_code, frame->f_lasti * 2)
@@ -39,54 +55,69 @@
3955
* @return Tuple, string and integer.
4056
**/
4157
static PyObject*
42-
get_file_and_line(PyObject* Py_UNUSED(module), PyObject* args)
58+
get_file_and_line(PyObject* Py_UNUSED(module), PyObject* cwd_obj)
4359
{
44-
PyThreadState* tstate = PyThreadState_GET();
45-
PyFrameObject* frame;
46-
PyObject* filename_o;
60+
PyThreadState* tstate = PyThreadState_Get();
61+
if (!tstate) {
62+
return NULL;
63+
}
64+
4765
int line;
66+
PyObject* filename_o = NULL;
67+
PyObject* result = NULL;
68+
PyObject* cwd_bytes = NULL;
69+
char* cwd = NULL;
4870

49-
PyObject *cwd_obj = Py_None, *cwd_bytes;
50-
char* cwd;
51-
if (!PyArg_ParseTuple(args, "O", &cwd_obj))
71+
if (!PyUnicode_FSConverter(cwd_obj, &cwd_bytes)) {
5272
return NULL;
53-
if (cwd_obj != Py_None) {
54-
if (!PyUnicode_FSConverter(cwd_obj, &cwd_bytes))
55-
return NULL;
56-
cwd = PyBytes_AsString(cwd_bytes);
57-
} else {
73+
}
74+
cwd = PyBytes_AsString(cwd_bytes);
75+
if (!cwd) {
76+
Py_DECREF(cwd_bytes);
5877
return NULL;
5978
}
6079

61-
if (NULL != tstate && NULL != GET_FRAME(tstate)) {
62-
frame = GET_FRAME(tstate);
63-
while (NULL != frame) {
64-
filename_o = GET_FILENAME(frame);
65-
const char* filename = PyUnicode_AsUTF8(filename_o);
66-
if (((strstr(filename, DD_TRACE_INSTALLED_PREFIX) != NULL && strstr(filename, TESTS_PREFIX) == NULL)) ||
67-
(strstr(filename, SITE_PACKAGES_PREFIX) != NULL || strstr(filename, cwd) == NULL)) {
80+
PyFrameObject* frame = GET_FRAME(tstate);
81+
if (!frame) {
82+
Py_DECREF(cwd_bytes);
83+
return NULL;
84+
}
6885

69-
frame = GET_PREVIOUS(frame);
70-
continue;
71-
}
72-
/*
73-
frame->f_lineno will not always return the correct line number
74-
you need to call PyCode_Addr2Line().
75-
*/
76-
line = GET_LINENO(frame);
77-
return PyTuple_Pack(2, filename_o, Py_BuildValue("i", line));
86+
while (NULL != frame) {
87+
filename_o = GET_FILENAME(frame);
88+
if (!filename_o) {
89+
goto exit;
90+
}
91+
const char* filename = PyUnicode_AsUTF8(filename_o);
92+
if (((strstr(filename, DD_TRACE_INSTALLED_PREFIX) != NULL && strstr(filename, TESTS_PREFIX) == NULL)) ||
93+
(strstr(filename, SITE_PACKAGES_PREFIX) != NULL || strstr(filename, cwd) == NULL)) {
94+
PyFrameObject* prev_frame = GET_PREVIOUS(frame);
95+
FRAME_DECREF(frame);
96+
FILENAME_DECREF(filename_o);
97+
frame = prev_frame;
98+
continue;
7899
}
100+
/*
101+
frame->f_lineno will not always return the correct line number
102+
you need to call PyCode_Addr2Line().
103+
*/
104+
line = GET_LINENO(frame);
105+
PyObject* line_obj = Py_BuildValue("i", line);
106+
if (!line_obj) {
107+
goto exit;
108+
}
109+
result = PyTuple_Pack(2, filename_o, line_obj);
110+
break;
79111
}
80-
#if PY_MAJOR_VERSION > 3 || PY_MAJOR_VERSION == 3 && PY_MINOR_VERSION >= 10
81-
return Py_NewRef(Py_None);
82-
#else
83-
Py_INCREF(Py_None);
84-
return Py_None;
85-
#endif
112+
exit:
113+
Py_DECREF(cwd_bytes);
114+
FRAME_XDECREF(frame);
115+
FILENAME_XDECREF(filename_o);
116+
return result;
86117
}
87118

88119
static PyMethodDef StacktraceMethods[] = {
89-
{ "get_info_frame", (PyCFunction)get_file_and_line, METH_VARARGS, "stacktrace functions" },
120+
{ "get_info_frame", (PyCFunction)get_file_and_line, METH_O, "stacktrace functions" },
90121
{ NULL, NULL, 0, NULL }
91122
};
92123

@@ -99,8 +130,7 @@ static struct PyModuleDef stacktrace = { PyModuleDef_HEAD_INIT,
99130
PyMODINIT_FUNC
100131
PyInit__stacktrace(void)
101132
{
102-
PyObject* m;
103-
m = PyModule_Create(&stacktrace);
133+
PyObject* m = PyModule_Create(&stacktrace);
104134
if (m == NULL)
105135
return NULL;
106136
return m;
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
fixes:
3+
- |
4+
This fix eliminates some reference leaks and C-API usage in the ``_iast`` module.

0 commit comments

Comments
 (0)