|
38 | 38 | "source": [
|
39 | 39 | "### 2.1. I/O CCA-UD\n",
|
40 | 40 | "\n",
|
41 |
| - "The following I/O descriptions do not correspond to a single function's parameters and/or return values, but serve as a general overview of what the algorithm uses and returns in a usage scenario.\n", |
| 41 | + "The following I/O descriptions do not correspond to a single function's parameters and/or return values, but serve as a general overview of what the algorithm uses and returns in a usage scenario:\n", |
42 | 42 | "\n",
|
43 | 43 | "### Inputs\n",
|
44 |
| - "| Input | Description |\n", |
45 |
| - "|-------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|\n", |
46 |
| - "| Poisoned training set features (`x_train`) | Dataset of independent variables used to train the classifier |\n", |
47 |
| - "| Poisoned training set labels (`y_train`) | Labels used to train the classifier |\n", |
48 |
| - "| Benign indices (`benign_indices`) | Indices of `x_train` that are definitely benign samples |\n", |
49 |
| - "| Final feature layer (`final_feature_layer`) | Name of the final layer that builds the feature representation. It is used to slice the DNN into two submodels |\n", |
50 |
| - "| Misclassification threshold (`misclassification_threshold`) | ($\\theta$ in the paper) Minimum percentage of correct classifications needed to consider a cluster as benign |\n", |
51 |
| - "| True poison labels (`is_clean`) | True poison labels used to evaluate the defence's performance agains the detected poisoned points |\n", |
| 44 | + "| Input | Optional | Default Value | Description |\n", |
| 45 | + "|-------------------------------------------------------------|----------|---------------|-----------------------------------------------------------------------------------------------------------------|\n", |
| 46 | + "| Classifier (`classifier`) | N | - | Classifier model that is being analyzed for poisoning. |\n", |
| 47 | + "| Poisoned training set features (`x_train`) | N | - | Dataset of independent variables used to train the classifier. |\n", |
| 48 | + "| Poisoned training set labels (`y_train`) | N | - | Labels used to train the classifier. |\n", |
| 49 | + "| Benign indices (`benign_indices`) | N | - | Indices of `x_train` that are definitely benign samples. |\n", |
| 50 | + "| Final feature layer (`final_feature_layer`) | N | - | Name of the final layer that builds the feature representation. It is used to slice the DNN into two submodels. |\n", |
| 51 | + "| Misclassification threshold (`misclassification_threshold`) | N | - | ($\\theta$ in the paper) Minimum percentage of correct classifications needed to consider a cluster as benign. |\n", |
| 52 | + "| True poison labels (`is_clean`) | N | - | True poison labels used to evaluate the defence's performance against the detected poisoned points. |\n", |
52 | 53 | "\n",
|
53 | 54 | "### Outputs\n",
|
54 | 55 | "| Ouptut | Description |\n",
|
55 | 56 | "|-----------------------|-----------------------------------------------------------------------------------------------------------------|\n",
|
56 | 57 | "| Poisoning verdict | List of `x_train` with 1/0 labels; 1 means the data point is clean, whereas 0 means it was detected as poisoned |\n",
|
57 | 58 | "| Report | Dictionary with report details on the dataset's performance |\n",
|
58 |
| - "| Confusion matrix JSON | JSON-like object with the detection performance results, given the true poisoned labels |\n" |
| 59 | + "| Confusion matrix JSON | JSON-like object with the detection performance results, given the true poisoned labels |\n", |
| 60 | + "\n", |
| 61 | + "\n", |
| 62 | + "The following table shows specific inputs used in the CCA-UD's object creation:\n", |
| 63 | + "\n", |
| 64 | + "### Inputs (implementation-specific)\n", |
| 65 | + "| Input | Optional | Default Value | Description |\n", |
| 66 | + "|-----------------------------------------------------------------|----------|-----------------------------------|--------------------------------------------------------------------------------------------|\n", |
| 67 | + "| Reducer (`reducer`) | Y | `UMAP(n_neighbors=5, min_dist=0)` | Dimensionality reducer used to reduce feature space. |\n", |
| 68 | + "| Clusterer (`clusterer`) | Y | `DBSCAN(eps=0.8, min_samples=20)` | Clustering algorithm used to cluster the reduced features. |\n", |
| 69 | + "| Feature extraction batch size (`feature_extraction_batch_size`) | Y | `32` | Batch size for feature extraction. Use lower values in case of low GPU memory availability |\n", |
| 70 | + "| Misclassification batch size (`misclassification_batch_size`) | Y | `32` | Batch size for misclassification. Use lower values in case of low GPU memory availability |\n" |
59 | 71 | ]
|
60 | 72 | },
|
61 | 73 | {
|
|
110 | 122 | "metadata": {},
|
111 | 123 | "source": [
|
112 | 124 | "### 3.1 Setup\n",
|
113 |
| - "Loggers are created and libraries are imported." |
| 125 | + "Loggers are created and libraries are imported. The usage of a Conda environment is strongly encouraged, as it not only manages ART's dependencies, but also non-Python dependencies that can boost CCA-UD's performance with a dedicated GPU." |
114 | 126 | ]
|
115 | 127 | },
|
116 | 128 | {
|
|
971 | 983 | "metadata": {},
|
972 | 984 | "source": [
|
973 | 985 | "#### 3.5.1. Benign subset selection\n",
|
974 |
| - "It is expected from the defender that he/she can provide indices of the training data that correspond to benign data in order to calculate the benign centroids. In this scenario, 40% of the benign samples in the full dataset are given as benign sample to the algorithm." |
| 986 | + "It is expected from the defender that he/she can provide indices of the training data that correspond to benign data in order to calculate the benign centroids. In this scenario, 30% of the benign samples in the full dataset are given as benign sample to the algorithm." |
975 | 987 | ]
|
976 | 988 | },
|
977 | 989 | {
|
|
0 commit comments