Minor corrections

c4ts0up · c4ts0up · commit f334cbde9de8 · 2025-07-07T22:41:10.000-05:00
Signed-off-by: Álvaro Bacca Peña &lt;a.baccap@uniandes.edu.co&gt;
diff --git a/art/defences/detector/poison/clustering_centroid_analysis.py b/art/defences/detector/poison/clustering_centroid_analysis.py
@@ -33,7 +33,7 @@
 from art.defences.detector.poison.poison_filtering_defence import PoisonFilteringDefence
 
 if TYPE_CHECKING:
-    from tensorflow.keras import Model, Sequential
+    from tensorflow.keras import Model
     from umap import UMAP
     from sklearn.base import ClusterMixin
     from art.utils import CLASSIFIER_TYPE
diff --git a/notebooks/poisoning_defense_clustering_centroid_analysis.ipynb b/notebooks/poisoning_defense_clustering_centroid_analysis.ipynb
@@ -38,24 +38,36 @@
    "source": [
     "### 2.1. I/O CCA-UD\n",
     "\n",
-    "The following I/O descriptions do not correspond to a single function's parameters and/or return values, but serve as a general overview of what the algorithm uses and returns in a usage scenario.\n",
+    "The following I/O descriptions do not correspond to a single function's parameters and/or return values, but serve as a general overview of what the algorithm uses and returns in a usage scenario:\n",
     "\n",
     "### Inputs\n",
-    "| Input                                                       | Description                                                                                                    |\n",
-    "|-------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|\n",
-    "| Poisoned training set features (`x_train`)                  | Dataset of independent variables used to train the classifier                                                  |\n",
-    "| Poisoned training set labels (`y_train`)                    | Labels used to train the classifier                                                                            |\n",
-    "| Benign indices (`benign_indices`)                           | Indices of `x_train` that are definitely benign samples                                                        |\n",
-    "| Final feature layer (`final_feature_layer`)                 | Name of the final layer that builds the feature representation. It is used to slice the DNN into two submodels |\n",
-    "| Misclassification threshold (`misclassification_threshold`) | ($\\theta$ in the paper) Minimum percentage of correct classifications needed to consider a cluster as benign    |\n",
-    "| True poison labels (`is_clean`)                             | True poison labels used to evaluate the defence's performance agains the detected poisoned points              |\n",
+    "| Input                                                       | Optional | Default Value | Description                                                                                                     |\n",
+    "|-------------------------------------------------------------|----------|---------------|-----------------------------------------------------------------------------------------------------------------|\n",
+    "| Classifier (`classifier`)                                   | N        | -             | Classifier model that is being analyzed for poisoning.                                                          |\n",
+    "| Poisoned training set features (`x_train`)                  | N        | -             | Dataset of independent variables used to train the classifier.                                                  |\n",
+    "| Poisoned training set labels (`y_train`)                    | N        | -             | Labels used to train the classifier.                                                                            |\n",
+    "| Benign indices (`benign_indices`)                           | N        | -             | Indices of `x_train` that are definitely benign samples.                                                        |\n",
+    "| Final feature layer (`final_feature_layer`)                 | N        | -             | Name of the final layer that builds the feature representation. It is used to slice the DNN into two submodels. |\n",
+    "| Misclassification threshold (`misclassification_threshold`) | N        | -             | ($\\theta$ in the paper) Minimum percentage of correct classifications needed to consider a cluster as benign.   |\n",
+    "| True poison labels (`is_clean`)                             | N        | -             | True poison labels used to evaluate the defence's performance against the detected poisoned points.             |\n",
     "\n",
     "### Outputs\n",
     "| Ouptut                | Description                                                                                                     |\n",
     "|-----------------------|-----------------------------------------------------------------------------------------------------------------|\n",
     "| Poisoning verdict     | List of `x_train` with 1/0 labels; 1 means the data point is clean, whereas 0 means it was detected as poisoned |\n",
     "| Report                | Dictionary with report details on the dataset's performance                                                     |\n",
-    "| Confusion matrix JSON | JSON-like object with the detection performance results, given the true poisoned labels                         |\n"
+    "| Confusion matrix JSON | JSON-like object with the detection performance results, given the true poisoned labels                         |\n",
+    "\n",
+    "\n",
+    "The following table shows specific inputs used in the CCA-UD's object creation:\n",
+    "\n",
+    "### Inputs (implementation-specific)\n",
+    "| Input                                                           | Optional | Default Value                     | Description                                                                                |\n",
+    "|-----------------------------------------------------------------|----------|-----------------------------------|--------------------------------------------------------------------------------------------|\n",
+    "| Reducer (`reducer`)                                             | Y        | `UMAP(n_neighbors=5, min_dist=0)` | Dimensionality reducer used to reduce feature space.                                       |\n",
+    "| Clusterer (`clusterer`)                                         | Y        | `DBSCAN(eps=0.8, min_samples=20)` | Clustering algorithm used to cluster the reduced features.                                 |\n",
+    "| Feature extraction batch size (`feature_extraction_batch_size`) | Y        | `32`                              | Batch size for feature extraction. Use lower values in case of low GPU memory availability |\n",
+    "| Misclassification batch size (`misclassification_batch_size`)   | Y        | `32`                              | Batch size for misclassification. Use lower values in case of low GPU memory availability  |\n"
    ]
   },
   {
@@ -110,7 +122,7 @@
    "metadata": {},
    "source": [
     "### 3.1 Setup\n",
-    "Loggers are created and libraries are imported."
+    "Loggers are created and libraries are imported. The usage of a Conda environment is strongly encouraged, as it not only manages ART's dependencies, but also non-Python dependencies that can boost CCA-UD's performance with a dedicated GPU."
    ]
   },
   {
@@ -971,7 +983,7 @@
    "metadata": {},
    "source": [
     "#### 3.5.1. Benign subset selection\n",
-    "It is expected from the defender that he/she can provide indices of the training data that correspond to benign data in order to calculate the benign centroids. In this scenario, 40% of the benign samples in the full dataset are given as benign sample to the algorithm."
+    "It is expected from the defender that he/she can provide indices of the training data that correspond to benign data in order to calculate the benign centroids. In this scenario, 30% of the benign samples in the full dataset are given as benign sample to the algorithm."
    ]
   },
   {
diff --git a/run_tests.sh b/run_tests.sh
@@ -151,10 +151,10 @@ else
                          "tests/defences/detector/evasion/test_subsetscanning_detector.py" \
                          "tests/defences/detector/poison/test_activation_defence.py" \
                          "tests/defences/detector/poison/test_clustering_analyzer.py" \
+                         "tests/defences/detector/poison/test_clustering_centroid_analysis.py" \
                          "tests/defences/detector/poison/test_ground_truth_evaluator.py" \
                          "tests/defences/detector/poison/test_provenance_defence.py" \
-                         "tests/defences/detector/poison/test_roni.py" \
-                         "tests/defences/detector/poison/test_clustering_centroid_analysis.py" )
+                         "tests/defences/detector/poison/test_roni.py" )
 
     declare -a metrics=("tests/metrics/test_gradient_check.py" \
                         "tests/metrics/test_metrics.py" \