-
Notifications
You must be signed in to change notification settings - Fork 1.2k
ART Unit Testing
The Adversarial Robustness Toolbox (ART) is a library which supports multiple frameworks simultaneously. For this reason, tests written for ART must be written keeping in mind that they will be ran across all frameworks supported by ART.
This page will clarify how tests should be written to achieve this end, presenting the conventions used as well as the various test helper tools available in ART to simplify this process.
Art makes heavy use of Pytest functionalities such as fixtures. Any information related to fixtures in general, can be found here.
The followings are good example ART tests that can be used as templates:
While debugging tests, it can become useful at times to run a given test with a specific framework. To do so, the
command line argument mlFramework
can be specified along with the relevant framework name.
pytest -q tests/estimators/classification/test_common_deeplearning.py --mlFramework=pytorch
The mlFramework
argument can be used with the following frameworks (tensorflow
, keras
, pytorch
and scikitlearn
). If no framework is provided, ART will run the tests with a default framework of its choice.
In order to achieve framework agnosticity, ART provides a few pytest fixtures which hide any framework specific concerns
of the test code within the pytest conftest.py
files. This makes writing tests for ART much easier and cleaner.
A list of all relevant ART fixtures can be found below.
As a general rule, tests should only implement the test logic regardless of the framework being used. Any framework
specific code should be hidden and placed within the relevant pytest conftest.py
files.
The following example presents a typical ART test.
@pytest.mark.framework_agnostic
def test_myTest(art_warning, get_default_mnist_subset, get_image_classifier_list):
try:
(x_train_mnist, y_train_mnist), (x_test_mnist, y_test_mnist) = get_default_mnist_subset
classifier, sess = get_image_classifier_list(one_classifier=True)
# example test code
labels = np.argmax(y_test_mnist, axis=1)
accuracy_2 = np.sum(np.argmax(classifier.predict(x_test_mnist), axis=1) == labels) / x_test_mnist.shape[0]
assert accuracy_2 == 0.99
except ARTTestException as e:
art_warning(e)
-
get_default_mnist_subset: The test avails of the
get_default_mnist_subset
fixture which takes care of retrieving the Mnist dataset, shaped correctly for whatever framework this test will be run with. The Pytorch and Tensorflow frameworks for example expect different image channel orderings. This fixtures takes care of providing the test with the channel ordering corresponding to the framework being used. -
get_image_classifier_list: The
get_image_classifier_list
is used quite extensibly within the tests and creates an image classifier using the framework this test is being ran with. If a framework specific implementation for an ART component does not exist yet, the test will fail gracefully and simply output a warning to notify that the test could not be run with this framework due to a missing component. -
@pytest.mark.framework_agnostic: The
@pytest.mark.framework_agnostic
pytest marker should be used in most cases. It indicates that, although the test can be run successfully in any framework, it does not dependent upon any framework specific implementations. Hence there is no need to run the same test across all frameworks, only one random framework will suffice. ART will thus run this test with a random framework. While most tests will fit this category, a few exceptions will eventually occur. Tests located in test_common_deeplearning.py for example must always be run with all the frameworks since they check whether each framework specific implementations of ART classifiers produce the exact same outputs. -
try/except and art_warning: In some cases, framework specific implementations of classifiers or other components needed will not have been implemented yet for a given framework. In order to gracefully move on to the next test, ART tests should be contained within a
try/except
clause with aart_warning
should be created. This will produces a report after the testing completes listing the components implementations currently missing for a given framework.
Finally, in addition to using fixtures, the following conventions are used across ART tests:
- Test files and tests themselves should not contain any reference to specific frameworks.
- Any framework specific test code (eg:
if framework == "tensorflow": do this
) should be placed in the relevantconftest.py
file - An ART-wid random generator master seed is already set within the project root
conftest.py
file. Hence there is no need to add suchmaster_seed(1234)
seeds within test code.
Here is a list of most common ART fixtures available when writing tests. They can be found in any of the pytest
conftest.py
files within the project
Fixture Name | Purpose |
---|---|
get_mnist_dataset |
provides the mnist dataset with the image channel ordered for the relevant framework being used |
get_mnist_dataset |
provides the mnist dataset with the image channel ordered for the relevant framework being used |
get_iris_dataset |
provides the iris dataset with the image channel ordered for the relevant framework being used |
get_default_mnist_subset |
provides a smaller mnist dataset |
image_data_generator |
provides the mnist dataset as a data generator |
mnist_shape |
provides the shape of the mnist dataset based on where the channel is positioned |
Fixture Name | Purpose |
---|---|
get_image_classifier_list |
provides a list of image classifiers corresponding to the framework the test is being run with |
In most cases only 1 classifier is returned, however for frameworks such as scikitlearn multiple classifiers are |
|
returned | |
get_tabular_classifier_list |
provides a list of tabular classifiers corresponding to the framework the test is being run with |
In most cases only 1 classifier is returned, however for frameworks such as scikitlearn multiple classifiers are |
|
returned | |
get_image_classifier_list_defended |
provides a defended version of the classifiers returned by |
get_image_classifier_list |
|
get_image_classifier_list_for_attack |
provides a list of image classifiers for the framework the test is being run with |
usable to perform a specific attack |
-
create_test_dir
: creates a temporary test directory -
create_test_image
: provides a test image -
store_expected_values
: stores any large value needed for a test in a json file. Theexpected_values
can be use thereafter to retrieves these values -
expected_values
: Retrieves values expected for a given test, previously stored using thestore_expected_values
. This fixture identifies whether a value needed for this test should take into account what framework the tests is being run with or not.
-
@pytest.mark.framework_agnostic
: indicates that, although the test can be run successfully in any framework, it does not dependent upon any framework specific implementations. Hence there is no need to run the same test across all frameworks, only one random framework will suffice. -
@pytest.mark.skipMlFramework("tensorflow","scikitlearn")
: indicates that a test currently fails when ran using a specific to be skipped for specific mlFramework values. Valid values are ("tensorflow" "keras" "pytorch" "scikitlearn") -
DEPRECATED
@pytest.mark.only_with_platform("keras")
: This marker is deprecated and should only be used for legacy tests that are not yet framework agnostic.
At times, tests require to assert that a given component produces an expected value. Such expected values can at times be numerous and consist of very large arrays which make the test code unnecessary convoluted and much harder to read. ART provides two helper fixtures which cache any expected values required and thus makes your test code much more readable and small.
While writing your test, the first version of your test using hardcoded expected values can use the
store_expected_values
fixture in order to cache such values as follows:
@pytest.mark.framework_agnostic
def test_myTest(get_default_mnist_subset, get_image_classifier_list, store_expected_values):
try:
(x_train_mnist, y_train_mnist), (x_test_mnist, y_test_mnist) = get_default_mnist_subset
classifier, sess = get_image_classifier_list(one_classifier=True)
expected_value1 = np.asarray(
[
0.0000000e00,
0.0000000e00,
0.0000000e00,
2.3582461e-03,
4.8802234e-04,
1.6699843e-03,
-6.4777887e-05,
-1.4215634e-03,
-1.3359448e-04,
2.0448549e-03,
2.8171093e-04,
1.9665064e-04,
1.5335126e-03,
1.7000455e-03,
-2.0136381e-04,
6.4588618e-04,
2.0524357e-03,
2.1990810e-03,
8.3692279e-04,
0.0000000e00,
0.0000000e00,
0.0000000e00,
0.0000000e00,
0.0000000e00,
0.0000000e00,
0.0000000e00,
0.0000000e00,
0.0000000e00,
]
)
# ... more expected value arrays
# example test code
labels = np.argmax(y_test_mnist, axis=1)
accuracy_2 = np.sum(np.argmax(classifier.predict(x_test_mnist), axis=1) == labels) / x_test_mnist.shape[0]
assert accuracy_2 == expected_value1
store_expected_values(expected_value1, expected_value2, ...)
except NotImplementedError as e:
warnings.warn(UserWarning(e))
Once the expected values have been cached, the final version of the test can be increased in readability and simplicity
by using the expected_values
fixture as follows:
@pytest.mark.framework_agnostic
def test_myTest(get_default_mnist_subset, get_image_classifier_list, expected_values):
try:
(x_train_mnist, y_train_mnist), (x_test_mnist, y_test_mnist) = get_default_mnist_subset
# this test is being run with
classifier, sess = get_image_classifier_list(one_classifier=True)
#retrieve the cached expected values
(expected_value1, expected_value2, ...) = expected_values
# example test code
labels = np.argmax(y_test_mnist, axis=1)
accuracy_2 = np.sum(np.argmax(classifier.predict(x_test_mnist), axis=1) == labels) / x_test_mnist.shape[0]
assert accuracy_2 == expected_value1
except NotImplementedError as e:
warnings.warn(UserWarning(e))