Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions src/main/rules/GCI108/GCI108.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"title": "Comparison between XGBoost and RandomForest",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could change title to give information about the real good practice and note a global title with no good practice to follow.
Maybe something like "Use XGBoost instead RandomForest" ...

"type": "CODE_SMELL",
"status": "ready",
"remediation": {
"func": "Constant/Issue",
"constantCost": "10min"
},
"tags": [
"creedengo",
"eco-design",
"performance",
"memory",
"ai"
],
"defaultSeverity": "Minor"
}
62 changes: 62 additions & 0 deletions src/main/rules/GCI108/python/GCI108.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
= Prefer Efficient Classifiers: XGBoost vs RandomForest

Using resource-heavy classifiers like `RandomForestClassifier` for standard classification tasks can result in longer execution times and higher carbon emissions.

`XGBoost` offers a more optimized and eco-friendly alternative with competitive accuracy.

== Metrics Comparison Table

[cols="1,1,1,1", options="header"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry but I don't understand how you do your tests ... could you give us more information on your testing context, please ?

|===
|Classifier |Accuracy |Time (s) |Carbon Emission (kg CO₂)

|RandomForestClassifier
|0.88
|1.24
|0.00091

|XGBClassifier
|0.89
|0.47
|0.00035
|===

XGBoost not only matches or exceeds the accuracy of RandomForest but also runs significantly faster and emits less CO₂.

== Visual Comparison
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in your different graphics, you show several frameworks ... could you explain why do you choose XGBoost instead of RadomForestClassifier ... and not other framework ? only based on your array with metrics ?
for example, seeing your graphs ... HistGradientBoosting or ExtraTrees are not so bad, no ?


=== Accuracy vs Dataset Size

image::accuracy_vs_size.png[Accuracy Comparison]

=== Carbon Emission vs Dataset Size

image::emissions_vs_size.png[Emission Comparison]

=== Execution Time vs Dataset Size

image::execution_time_vs_size.png[Time Comparison]

== Non Compliant Code Example

[source,python]
----
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
----

== Compliant Code Example

[source,python]
----
from xgboost import XGBClassifier

model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")
model.fit(X_train, y_train)
----

== 📓 Article about XGBoost: A Scalable Tree Boosting System

This article explains in details what XGBoost can do : https://arxiv.org/pdf/1603.02754
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.