Skip to content

Commit df16d20

Browse files
GCI96 AvoidIterativeMatrixOperations
Co-authored-by: DataLabGroupe-CreditAgricole <[email protected]>
1 parent a5c2c8e commit df16d20

File tree

5 files changed

+136
-0
lines changed

5 files changed

+136
-0
lines changed

src/main/rules/GCI96/GCI96.json

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
{
2+
"title": "Avoid Iterative Matrix Operations",
3+
"type": "CODE_SMELL",
4+
"status": "ready",
5+
"remediation": {
6+
"func": "Constant\/Issue",
7+
"constantCost": "10min"
8+
},
9+
"tags": [
10+
"creedengo",
11+
"eco-design",
12+
"performance",
13+
"data",
14+
"ai",
15+
"vector",
16+
"pandas",
17+
"numpy"
18+
],
19+
"defaultSeverity": "Minor"
20+
}
21+
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
Before going into more detail, it's important to understand how vectorization works in Python. When performing a calculation on an array/matrix, there are several possible methods:
2+
3+
The first is to go through the list and perform the calculation element by element, known as an iterative approach.
4+
The second method consists of applying the calculation to the entire array/matrix at once, which is known as vectorization.
5+
6+
Although it's not possible to do this in all cases without applying real parallelism using a GPU, for example, we speak of vectorization when we use the built-in functions of TensorFlow, NumPy or Pandas.
7+
8+
We'll also have a iterative loop, but it will be executed in lower-level code (C). As with the use of built-in functions in general, since low-level languages like C are optimized, execution will be much faster and therefore emit less CO2.
9+
10+
== Non compliant Code Example
11+
12+
[source,python]
13+
----
14+
for i in range(len(A)):
15+
for j in range(len(B[0])):
16+
for k in range(len(B)):
17+
results[i][j] += A[i][k] * B[k][j]
18+
----
19+
20+
== Compliant Solution
21+
22+
[source,python]
23+
----
24+
results = np.dot(A, B)
25+
----
26+
27+
== Relevance Analysis
28+
29+
The following results were obtained through local experiments.
30+
31+
=== Configuration
32+
33+
* Processor: Intel(R) Core(TM) Ultra 5 135U, 2100 MHz, 12 cores, 14 logical processors
34+
* RAM: 16 GB
35+
* CO2 Emissions Measurement: Using CodeCarbon
36+
37+
=== Context
38+
39+
This study is divided into 3 parts, comparing a vectorized and an iterative method:
40+
measuring the impact on a dot product between two vectors,
41+
measuring the impact on an outer product between two vectors,
42+
measuring the impact on a matrix calculation.
43+
44+
=== Impact Analysis
45+
46+
*1. dot product:*
47+
48+
*Non compliant*
49+
[source,python]
50+
----
51+
def iterative_dot_product(x,y):
52+
total = 0
53+
for i in range(len(x)):
54+
total += x[i] * y[i]
55+
return total
56+
----
57+
*Compliant*
58+
[source,python]
59+
----
60+
def vectorized_dot_product(x,y):
61+
return np.dot(x,y)
62+
----
63+
image::dot.png[]
64+
65+
*2. Outer product:*
66+
67+
*Non compliant*
68+
[source,python]
69+
----
70+
def iterative_outer_product(x, y):
71+
o = np.zeros((len(x), len(y)))
72+
for i in range(len(x)):
73+
for j in range(len(y)):
74+
o[i][j] = x[i] * y[j]
75+
return o
76+
----
77+
*Compliant*
78+
[source,python]
79+
----
80+
def vectorized_outer_product(x, y):
81+
return np.outer(x, y)
82+
----
83+
image::outer.png[]
84+
85+
*3. Matrix product:*
86+
87+
*Non compliant*
88+
[source,python]
89+
----
90+
def iterative_matrix_product(A, B):
91+
for i in range(len(A)):
92+
for j in range(len(B[0])):
93+
for k in range(len(B)):
94+
results[i][j] += A[i][k] * B[k][j]
95+
return results
96+
----
97+
*Compliant*
98+
[source,python]
99+
----
100+
def vectorized_outer_product(A, B):
101+
return np.dot(A, B)
102+
----
103+
image::matrix.png[]
104+
105+
=== Conclusion
106+
107+
The results show that the vectorized method is significantly faster than the iterative method. The CO2 emissions are also lower. This is a clear example of how using built-in functions can lead to more efficient code, both in terms of performance and environmental impact.
108+
109+
=== References
110+
111+
https://sciresol.s3.us-east-2.amazonaws.com/IJST/Articles/2024/Issue-24/IJST-2024-914.pdf
112+
113+
https://arxiv.org/pdf/2308.01269
114+
115+
https://www.db-thueringen.de/servlets/MCRFileNodeServlet/dbt_derivate_00062165/ilm1-2024200012.pdf
31.5 KB
Loading
31 KB
Loading
26.7 KB
Loading

0 commit comments

Comments
 (0)