Skip to content

Commit d0e7050

Browse files
committed
CWE-595: Added code example for string interning and integer caching
Signed-off-by: edanhub <[email protected]>
1 parent 8c18703 commit d0e7050

File tree

2 files changed

+79
-1
lines changed

2 files changed

+79
-1
lines changed

docs/Secure-Coding-Guide-for-Python/CWE-697/CWE-595/README.md

Lines changed: 54 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,60 @@
11
# CWE-595: Comparison of Object References Instead of Object Contents
22

3-
In Python, the `==` operator is implemented by the `__eq__` method on an object [[python.org data model 2023](https://docs.python.org/3/reference/datamodel.html?highlight=__eq__#object.__eq__)]. For built-in types like `int` and `str`, the comparison is implemented in the interpreter. The main issue comes when implementing custom classes, where the default implementation compares object references using the `is` operator. The `is` operator compares the identities of the objects, equivalent to `id(obj1) == id(obj2)`. The `id` function is built into Python, and in the CPython interpreter, the standard implementation, it returns the object's memory address [[de Langen 2023](https://realpython.com/python-is-identity-vs-equality/)].
3+
Prevent unexpected results by knowing the differences between comparison operators such as `==` and `is`.
4+
5+
Python falls back to comparing objects' `id()` if the `__eq__` implementation is missing for a custom class. In Python, the `==` operator is implemented by the `__eq__` method on an object [[python.org data model 2023](https://docs.python.org/3/reference/datamodel.html?highlight=__eq__#object.__eq__)]. For built-in types like `int` and `str`, the comparison is implemented in the interpreter. The main issue comes when implementing custom classes, where the default implementation compares object references using the `is` operator. The `is` operator compares the identities of the objects, equivalent to `id(obj1) == id(obj2)`. The `id` function is built into Python, and in the CPython interpreter, the standard implementation, it returns the object's memory address [[de Langen 2023](https://realpython.com/python-is-identity-vs-equality/)].
46

57
You want to implement the `__eq__` method on a class if you believe you ever want to compare it to another object or find it in a list of objects. Actually, it is so common that the `dataclasses.dataclass` decorator by default implements it for you [[dataclasses — Data Classes — Python 3.11.4 documentation](https://docs.python.org/3/library/dataclasses.html#dataclasses.dataclass)].
68

9+
Be aware of Python's memory optimization for strings and numbers as demonstrated in `example01.py` code.
10+
Python tries to avoid allocating more memory for the same string. The process of reusing already existing strings is a Python optimization technique known as **String interning** [[sys — System-specific parameters and functions — Python 3.11.4 documentation](https://docs.python.org/3/library/sys.html#sys.intern)] According to the documentation, "CPython keeps an array of integer objects for all integers between `-5` and `256`. When you create an `int` in that range you actually just get back a reference to the existing object." [[Integer objects — Python 3.11.4 documentation](https://docs.python.org/3/c-api/long.html#c.PyLong_FromLong)]
11+
12+
_[example01.py:](example01.py)_
13+
14+
```py
15+
""" Code Example """
16+
17+
print("-" * 10 + "Memory optimization with strings" + 10 * "-")
18+
a = "foobar"
19+
b = "foobar"
20+
c = ''.join(["foo", "bar"])
21+
print(f"a is b: {a} is {b}?", a is b)
22+
print(f"a is c: {a} is {c}?", a is c)
23+
print(f"a == c: {a} == {c}?", a == c)
24+
print(f"size? len(a)={len(a)} len(b)={len(b)} len(c)={len(c)}")
25+
26+
print("-" * 10 + "Memory optimization with numbers" + 10 * "-")
27+
a = b = 256
28+
print (f"{a} is {b}?", a is b)
29+
a = b = 257
30+
print (f"{a} is {b}?", a is b)
31+
32+
print("-" * 10 + "Memory optimization with numbers in a loop" + 10 * "-")
33+
a = b = 255
34+
while(a is b):
35+
a += 1
36+
b += 1
37+
print (f"{a} is {b}?", a is b)
38+
```
39+
40+
__Output of example01.py:__
41+
42+
```bash
43+
----------Memory optimization with strings----------
44+
a is b: foobar is foobar? True
45+
a is c: foobar is foobar? False
46+
a == c: foobar == foobar? True
47+
size? len(a)=6 len(b)=6 len(c)=6
48+
----------Memory optimization with numbers----------
49+
256 is 256? True
50+
257 is 257? True
51+
----------Memory optimization with numbers in a loop----------
52+
256 is 256? True
53+
257 is 257? False
54+
```
55+
56+
The first set of print statements illustrates string interning. While `a` and `b` reuse the same object, `c` is created by joining two new strings, which results in an object with a different `id()`. The variables in the middle example both point to the same number object, which is why comparing them after `a = b = 257` still returns `True` even though `257` falls outside of the cached range. However, when assigning values in a loop, Python needs to allocate new objects for numbers greater than `256` and thus will create two separate objects as soon as it hits `257`. The way caching and interning works may differ between running a Python script from a file and using REPL, which may produce different results when running `example01.py` in Python's interactive mode.
57+
758
## Non-Compliant Code Example
859

960
The non-compliant code shows how the default comparison operator compares object references rather than the object values. Furthermore, it displays how this causes issues when comparing lists of objects, although it applies to other types of collections as well. Then, it shows how the `in` operator also depends on the behavior of the `__eq__` method and, therefore, also returns a non-desirable result. Finally, it performs the comparison with the `is` operator, which checks as to whether the references point to the same object regardless of the stored value.
@@ -104,3 +155,5 @@ print(a is b)
104155
|[[python.org data model 2023](https://docs.python.org/3/reference/datamodel.html?highlight=__eq__#object.__eq__)]|[3. Data model — Python 3.11.3 documentation](https://docs.python.org/3/reference/datamodel.html?highlight=__eq__#object.__eq__)|
105156
|[[de Langen 2023](https://realpython.com/python-is-identity-vs-equality/)]|[Python '!=' Is Not 'is not': Comparing Objects in Python – Real Python](https://realpython.com/python-is-identity-vs-equality/)|
106157
|[[dataclasses — Data Classes — Python 3.11.4 documentation](https://docs.python.org/3/library/dataclasses.html#dataclasses.dataclass)]|[9. Classes — Python 3.11.3 documentation](https://docs.python.org/3/tutorial/classes.html)|
158+
|[[sys — System-specific parameters and functions — Python 3.11.4 documentation](https://docs.python.org/3/library/sys.html#sys.intern)]|[sys — System-specific parameters and functions — Python 3.11.3 documentation](https://docs.python.org/3/library/sys.html#sys.intern)|
159+
|[[Integer objects — Python 3.11.4 documentation](https://docs.python.org/3/c-api/long.html#c.PyLong_FromLong)]|[Integer objects — Python 3.11.4 documentation](https://docs.python.org/3/c-api/long.html#c.PyLong_FromLong)|
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# SPDX-FileCopyrightText: OpenSSF project contributors
2+
# SPDX-License-Identifier: MIT
3+
""" Code Example """
4+
5+
print("-" * 10 + "Memory optimization with strings" + 10 * "-")
6+
a = "foobar"
7+
b = "foobar"
8+
c = ''.join(["foo", "bar"])
9+
print(f"a is b: {a} is {b}?", a is b)
10+
print(f"a is c: {a} is {c}?", a is c)
11+
print(f"a == c: {a} == {c}?", a == c)
12+
print(f"size? len(a)={len(a)} len(b)={len(b)} len(c)={len(c)}")
13+
14+
print("-" * 10 + "Memory optimization with numbers" + 10 * "-")
15+
a = b = 256
16+
print (f"{a} is {b}?", a is b)
17+
a = b = 257
18+
print (f"{a} is {b}?", a is b)
19+
20+
print("-" * 10 + "Memory optimization with numbers in a loop" + 10 * "-")
21+
a = b = 255
22+
while(a is b):
23+
a += 1
24+
b += 1
25+
print (f"{a} is {b}?", a is b)

0 commit comments

Comments
 (0)