green-code-initiative
diff --git a/‎src/main/rules/GCI107/GCI107.json‎
Lines changed: 21 additions & 0 deletions b/‎src/main/rules/GCI107/GCI107.json‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎src/main/rules/GCI107/python/GCI107.asciidoc‎
Lines changed: 119 additions & 0 deletions b/‎src/main/rules/GCI107/python/GCI107.asciidoc‎
Lines changed: 119 additions & 0 deletions
diff --git a/‎src/main/rules/GCI107/python/dot.png‎
31.5 KB b/‎src/main/rules/GCI107/python/dot.png‎
31.5 KB
diff --git a/‎src/main/rules/GCI107/python/matrix.png‎
31 KB b/‎src/main/rules/GCI107/python/matrix.png‎
31 KB
diff --git a/‎src/main/rules/GCI107/python/outer.png‎
26.7 KB b/‎src/main/rules/GCI107/python/outer.png‎
26.7 KB
diff --git a/‎src/main/rules/GCI96/GCI96.json‎
Lines changed: 23 additions & 0 deletions b/‎src/main/rules/GCI96/GCI96.json‎
Lines changed: 23 additions & 0 deletions
diff --git a/‎src/main/rules/GCI96/python/GCI96.asciidoc‎
Lines changed: 116 additions & 0 deletions b/‎src/main/rules/GCI96/python/GCI96.asciidoc‎
Lines changed: 116 additions & 0 deletions
@@ -0,0 +1,21 @@
+{
+    "title": "DATA : Avoid Iterative Matrix Operations",
+    "type": "CODE_SMELL",
+    "status": "ready",
+    "remediation": {
+      "func": "Constant\/Issue",
+      "constantCost": "10min"
+    },
+    "tags": [
+      "creedengo",
+      "eco-design",
+      "performance",
+      "data",
+      "ai",
+      "vector",
+      "pandas",
+      "numpy"
+    ],
+    "defaultSeverity": "Minor"
+  }
+  
@@ -0,0 +1,119 @@
+Before going into more detail, it's important to understand how vectorization works in Python. When performing a calculation on an array/matrix, there are several feasible methods:
+
+The first is to go through the list and perform the calculation element by element, known as an iterative approach.
+The second method consists of applying the calculation to the entire array/matrix at once, which is known as vectorization.
+
+Although it's not feasible to do this in all cases without applying real parallelism using a GPU, for example, we speak of vectorization when we use the built-in functions of TensorFlow, NumPy or Pandas.
+
+We'll also have an iterative loop, but it will be executed in lower-level code (C). As with the use of built-in functions in general, since low-level languages like C are optimized, execution will be much faster and therefore emit less CO2.
+
+== Non compliant Code Example
+
+[source,python]
+----
+results = [[0 for _ in range(cols_B)] for _ in range(rows_A)]
+
+
+for i in range(len(A)):
+    for j in range(len(B[0])):
+        for k in range(len(B)):
+            results[i][j] += A[i][k] * B[k][j]
+----
+
+== Compliant Solution
+
+[source,python]
+----
+results = np.dot(A, B)
+# np stands for NumPy, the Python library used to manipulate data series.
+----
+
+== Relevance Analysis
+
+The following results were obtained through local experiments.
+
+=== Configuration
+
+* Processor: Intel(R) Core(TM) Ultra 5 135U, 2100 MHz, 12 cores, 14 logical processors
+* RAM: 16 GB
+* CO2 Emissions Measurement: Using CodeCarbon
+
+=== Context
+
+This study is divided into 3 parts, comparing a vectorized and an iterative method: 
+measuring the impact on a dot product between two vectors,
+measuring the impact on an outer product between two vectors,
+measuring the impact on a matrix calculation.
+
+=== Impact Analysis
+
+*1. dot product:*
+
+*Non compliant*
+[source,python]
+----
+def iterative_dot_product(x,y):
+    total = 0
+    for i in range(len(x)):
+        total += x[i] * y[i]
+    return total
+----
+*Compliant* 
+[source,python]
+----
+def vectorized_dot_product(x,y):
+    return np.dot(x,y)
+----
+image::dot.png[]
+
+*2. Outer product:*
+
+*Non compliant*
+[source,python]
+----
+def iterative_outer_product(x, y):
+    o = np.zeros((len(x), len(y)))
+    for i in range(len(x)):
+        for j in range(len(y)):
+            o[i][j] = x[i] * y[j]
+    return o
+----
+*Compliant* 
+[source,python]
+----
+def vectorized_outer_product(x, y):
+    return np.outer(x, y)
+----
+image::outer.png[]
+
+*3. Matrix product:*
+
+*Non compliant*
+[source,python]
+----
+def iterative_matrix_product(A, B):
+    for i in range(len(A)):
+        for j in range(len(B[0])):
+            for k in range(len(B)):
+                results[i][j] += A[i][k] * B[k][j]
+    return results
+----
+*Compliant* 
+[source,python]
+----
+def vectorized_outer_product(A, B):
+    return np.dot(A, B)
+----
+image::matrix.png[]
+
+=== Conclusion
+
+The results show that the vectorized method is significantly faster than the iterative method. The CO2 emissions are also lower. This is a clear example of how using built-in functions can lead to more efficient code, both in terms of performance and environmental impact.
+
+=== References
+
+https://sciresol.s3.us-east-2.amazonaws.com/IJST/Articles/2024/Issue-24/IJST-2024-914.pdf
+
+https://arxiv.org/pdf/2308.01269
+
+https://www.db-thueringen.de/servlets/MCRFileNodeServlet/dbt_derivate_00062165/ilm1-2024200012.pdf
@@ -1,4 +1,26 @@
 {
+<<<<<<< HEAD
+    "title": "DATA : Avoid Iterative Matrix Operations",
+    "type": "CODE_SMELL",
+    "status": "ready",
+    "remediation": {
+      "func": "Constant\/Issue",
+      "constantCost": "10min"
+    },
+    "tags": [
+      "creedengo",
+      "eco-design",
+      "performance",
+      "data",
+      "ai",
+      "vector",
+      "pandas",
+      "numpy"
+    ],
+    "defaultSeverity": "Minor"
+  }
+  
+=======
   "title": "DATA/AI Pandas - Avoid Reading Unnecessary Columns in CSV Files",
   "type": "CODE_SMELL",
   "status": "ready",
@@ -16,3 +38,4 @@
   ],
   "defaultSeverity": "Minor"
 }
+>>>>>>> main
@@ -1,3 +1,24 @@
+<<<<<<< HEAD
+Before going into more detail, it's important to understand how vectorization works in Python. When performing a calculation on an array/matrix, there are several possible methods:
+
+The first is to go through the list and perform the calculation element by element, known as an iterative approach.
+The second method consists of applying the calculation to the entire array/matrix at once, which is known as vectorization.
+
+Although it's not possible to do this in all cases without applying real parallelism using a GPU, for example, we speak of vectorization when we use the built-in functions of TensorFlow, NumPy or Pandas.
+
+We'll also have a iterative loop, but it will be executed in lower-level code (C). As with the use of built-in functions in general, since low-level languages like C are optimized, execution will be much faster and therefore emit less CO2.
+
+== Non compliant Code Example
+
+[source,python]
+----
+for i in range(len(A)):
+    for j in range(len(B[0])):
+        for k in range(len(B)):
+            results[i][j] += A[i][k] * B[k][j]
+----
+
+=======
 This rule is specific to Python because it's related to the Pandas library, which is widely used for data manipulation and analysis in Python.
 
 Reading CSV files without explicitly specifying which columns to load leads to unnecessary data loading and increases memory and energy consumption. This guidance is specific to the use of the Pandas library in Python, but it aligns with the more general GCI74: Avoid SELECT * from table in SQL. To ensure low environmental impact and optimal performance, always use the usecols parameter in pandas.read_csv() to select only the required columns.
@@ -14,10 +35,20 @@ df = pd.read_csv('data.csv')
 
 In this case, **all columns** are read into memory, even if only one or two are needed.
 
+>>>>>>> main
 == Compliant Solution
 
 [source,python]
 ----
+<<<<<<< HEAD
+results = np.dot(A, B)
+# np stands for NumPy, the Python library used to manipulate data series.
+----
+
+== Relevance Analysis
+
+The following results were obtained through local experiments.
+=======
 file_path = 'data.csv'
 df = pd.read_csv(file_path, usecols=['A', 'B'])  # Only read needed columns
 ----
@@ -27,11 +58,95 @@ This ensures only the necessary data is loaded, reducing memory usage and energy
 == Relevance Analysis
 
 Local experiments were conducted to assess the environmental impact of reading CSV files with and without column selection.
+>>>>>>> main
 
 === Configuration
 
 * Processor: Intel(R) Core(TM) Ultra 5 135U, 2100 MHz, 12 cores, 14 logical processors
 * RAM: 16 GB
+<<<<<<< HEAD
+* CO2 Emissions Measurement: Using CodeCarbon
+
+=== Context
+
+This study is divided into 3 parts, comparing a vectorized and an iterative method: 
+measuring the impact on a dot product between two vectors,
+measuring the impact on an outer product between two vectors,
+measuring the impact on a matrix calculation.
+
+=== Impact Analysis
+
+*1. dot product:*
+
+*Non compliant*
+[source,python]
+----
+def iterative_dot_product(x,y):
+    total = 0
+    for i in range(len(x)):
+        total += x[i] * y[i]
+    return total
+----
+*Compliant* 
+[source,python]
+----
+def vectorized_dot_product(x,y):
+    return np.dot(x,y)
+----
+image::dot.png[]
+
+*2. Outer product:*
+
+*Non compliant*
+[source,python]
+----
+def iterative_outer_product(x, y):
+    o = np.zeros((len(x), len(y)))
+    for i in range(len(x)):
+        for j in range(len(y)):
+            o[i][j] = x[i] * y[j]
+    return o
+----
+*Compliant* 
+[source,python]
+----
+def vectorized_outer_product(x, y):
+    return np.outer(x, y)
+----
+image::outer.png[]
+
+*3. Matrix product:*
+
+*Non compliant*
+[source,python]
+----
+def iterative_matrix_product(A, B):
+    for i in range(len(A)):
+        for j in range(len(B[0])):
+            for k in range(len(B)):
+                results[i][j] += A[i][k] * B[k][j]
+    return results
+----
+*Compliant* 
+[source,python]
+----
+def vectorized_outer_product(A, B):
+    return np.dot(A, B)
+----
+image::matrix.png[]
+
+=== Conclusion
+
+The results show that the vectorized method is significantly faster than the iterative method. The CO2 emissions are also lower. This is a clear example of how using built-in functions can lead to more efficient code, both in terms of performance and environmental impact.
+
+=== References
+
+https://sciresol.s3.us-east-2.amazonaws.com/IJST/Articles/2024/Issue-24/IJST-2024-914.pdf
+
+https://arxiv.org/pdf/2308.01269
+
+https://www.db-thueringen.de/servlets/MCRFileNodeServlet/dbt_derivate_00062165/ilm1-2024200012.pdf
+=======
 * CO₂ Emissions Measurement: https://mlco2.github.io/codecarbon/[CodeCarbon]
 
 === Context
@@ -72,3 +187,4 @@ This is especially critical when working with large datasets or in environments
 https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
 https://medium.com/@amit25173/what-is-usecols-in-pandas-7a6a43885f4b
 
+>>>>>>> main