Skip to content

Commit aa6cb6a

Browse files
GCI104 AvoidCreatingTensorUsingNumpyOrNativePython #AI #Python #DLG #RulesSpecifications
Co-authored-by: DataLabGroupe-CreditAgricole <[email protected]>
1 parent a5c2c8e commit aa6cb6a

File tree

5 files changed

+94
-0
lines changed

5 files changed

+94
-0
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
### Added
1111

12+
- Added rule GCI104 on Torch Tensor types
13+
1214
### Changed
1315

1416
- Correction of various typos in rules documentations

RULES.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,8 @@ Some are applicable for different technologies.
7171
| GCI92 | Use string.Length instead of comparison with empty string | Comparing a string to an empty string is unnecessary and can be replaced by a call to `string.Length` which is more performant and more readable. | | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 || 🚫 |
7272
| GCI93 | Return `Task` directly | Consider returning a `Task` directly instead of a single `await` | ||||||||
7373
| GCI94 | Use orElseGet instead of orElse | Parameter of orElse() is evaluated, even when having a non-empty Optional. Supplier method of orElseGet passed as an argument is only executed when an Optional value isn’t present. Therefore, using orElseGet() will save computing time. | [Optimized use of Java Optional Else](https://github.com/green-code-initiative/creedengo-challenge/issues/77) || 🚫 | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
74+
GCI104 | DATA/AI PyTorch - Create tensors directly from Torch |
75+
In PyTorch, prefer creating tensors directly using `torch.rand`, `torch.tensor`, or other Torch methods instead of converting from NumPy arrays. Avoid using `torch.tensor(np.random.rand(...))` or similar patterns when the same result can be achieved directly with PyTorch. | | 🚫 | 🚫 | 🚫 | 🚀 | 🚫 | 🚫 | 🚫 |
7476
| GCI203 | Detect unoptimized file formats | When it is possible, to use svg format image over other image format | | 🚧 | 🚀 | 🚀 || 🚀 | 🚀 | 🚫 |
7577
| GCI404 | Avoid list comprehension in iterations | Use generator comprehension instead of list comprehension in for loop declaration | | 🚫 | 🚫 | 🚫 || 🚫 | 🚫 | 🚫 |
7678
| GCI522 | Sobriety: Brightness Override | To avoid draining the battery, iOS and Android devices adapt the brightness of the screen depending on the environment light. | | 🚫 | 🚫 || 🚫 | 🚫 | 🚫 | 🚫 |

src/main/rules/GCI104/GCI104.json

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{
2+
"title": "DATA/AI PyTorch - Create tensors directly from Torch`",
3+
"type": "CODE_SMELL",
4+
"status": "ready",
5+
"remediation": {
6+
"func": "Constant/Issue",
7+
"constantCost": "10min"
8+
},
9+
"tags": [
10+
"creedengo",
11+
"eco-design",
12+
"performance",
13+
"ai",
14+
"PyTorch"
15+
],
16+
"defaultSeverity": "Minor"
17+
}
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
In PyTorch, prefer creating tensors directly using `torch.rand`, `torch.tensor`, or other Torch methods instead of converting from NumPy arrays. Avoid using `torch.tensor(np.random.rand(...))` or similar patterns when the same result can be achieved directly with PyTorch.
2+
3+
== Non Compliant Code Example
4+
5+
[source,python]
6+
----
7+
import torch
8+
import numpy as np
9+
10+
def non_compliant_random_rand():
11+
tensor = torch.tensor(np.random.rand(1000, 1000))
12+
----
13+
14+
15+
== Compliant Solution
16+
17+
[source,python]
18+
----
19+
import torch
20+
21+
def compliant_random_rand():
22+
tensor = torch.rand([1000, 1000])
23+
----
24+
25+
26+
== Relevance Analysis
27+
28+
Experiments were conducted to compare the performance and environmental impact of two tensor creation methods in PyTorch:
29+
30+
- Using NumPy for random data generation followed by conversion to PyTorch tensor
31+
- Direct creation using native PyTorch tensor functions (`torch.rand`, `torch.tensor`, etc.)
32+
33+
=== Configuration
34+
35+
* Processor: Intel(R) Xeon(R) CPU 3.80GHz
36+
* RAM: 64GB
37+
* GPU: NVIDIA Quadro RTX 6000
38+
* CO₂ Emissions Measurement: https://mlco2.github.io/codecarbon/[CodeCarbon]
39+
* Framework: PyTorch
40+
* Dataset: MNIST
41+
* Model: Simple 2-layer fully connected network
42+
43+
=== Context
44+
45+
Two workflows were benchmarked:
46+
- *NumPy-based:* Data created using NumPy and converted to PyTorch
47+
- *Torch-based:* Data created natively using PyTorch tensor operations
48+
49+
Metrics assessed:
50+
- Training execution time
51+
- CO₂ emissions
52+
- Final model accuracy
53+
54+
=== Impact Analysis
55+
56+
image::image.png[]
57+
58+
- *Execution Time:* Torch-based method reduced total training time by more than **50%**
59+
- *Carbon Emissions:* Torch-based method lowered emissions by approximatively **50%**
60+
- *Accuracy:* Both approaches yielded **comparable model accuracy**
61+
62+
== Conclusion
63+
64+
Using native PyTorch methods to create tensors:
65+
66+
- Significantly reduces training time
67+
- Minimizes unnecessary memory operations and conversions
68+
- Reduces carbon footprint
69+
- Maintains model performance
70+
== References
71+
72+
- PyTorch Tensor Docs: https://pytorch.org/docs/stable/tensors.html
73+
- Credit: https://github.com/AghilesAzzoug/GreenPyData
41.5 KB
Loading

0 commit comments

Comments
 (0)