Skip to content

ART Unit Testing

Killian Levacher edited this page Oct 9, 2020 · 39 revisions

ART Testing Framework

The Adversarial Robustness Toolbox (ART) is a library which supports multiple frameworks simultaneously. For this reason, tests written for ART must be written keeping in mind that they will be ran across all frameworks supported by ART.

This page will clarify how tests should be written to achieve this end, presenting the conventions used as well as the various test helper tools available in ART to simplify this process.

Art makes heavy use of Pytest functionalities such as fixtures. Any information related to fixtures in general, can be found here.

The followings are good example ART tests that can be used as templates:

1. Running a test with a specific framework

While debugging tests, it can become useful at times to run a given test with a specific framework. To do so, the command line argument mlFramework can be specified along with the relevant framework name.

pytest -q tests/estimators/classification/test_common_deeplearning.py --mlFramework=pytorch

The mlFramework argument can be used with the following frameworks (tensorflow, keras, pytorch and scikitlearn). If no framework is provided, ART will run the tests with a default framework of its choice.

2. Writing Framework Agnostic Tests

In order to achieve framework agnosticity, ART provides a few pytest fixtures which hide any framework specific concerns of the test code within the pytest conftest.py files. This makes writing tests for ART much easier and cleaner. A list of all relevant ART fixtures can be found below. As a general rule, tests should only implement the test logic regardless of the framework being used. Any framework specific code should be hidden and placed within the relevant pytest conftest.py files.

The following example presents a typical ART test.

@pytest.mark.framework_agnostic
def test_myTest(art_warning, get_default_mnist_subset, get_image_classifier_list):
    try:
        (x_train_mnist, y_train_mnist), (x_test_mnist, y_test_mnist) = get_default_mnist_subset
    
        classifier, sess = get_image_classifier_list(one_classifier=True)
    
        # example test code
        labels = np.argmax(y_test_mnist, axis=1)
        accuracy_2 = np.sum(np.argmax(classifier.predict(x_test_mnist), axis=1) == labels) / x_test_mnist.shape[0]
        assert accuracy_2 == 0.99
    except ARTTestException as e:
        art_warning(e)
  • get_default_mnist_subset: The test avails of the get_default_mnist_subset fixture which takes care of retrieving the Mnist dataset, shaped correctly for whatever framework this test will be run with. The Pytorch and Tensorflow frameworks for example expect different image channel orderings. This fixtures takes care of providing the test with the channel ordering corresponding to the framework being used.

  • get_image_classifier_list: The get_image_classifier_list is used quite extensibly within the tests and creates an image classifier using the framework this test is being ran with. If a framework specific implementation for an ART component does not exist yet, the test will fail gracefully and simply output a warning to notify that the test could not be run with this framework due to a missing component.

  • @pytest.mark.framework_agnostic: The @pytest.mark.framework_agnostic pytest marker should be used in most cases. It indicates that, although the test can be run successfully in any framework, it does not dependent upon any framework specific implementations. Hence there is no need to run the same test across all frameworks, only one random framework will suffice. ART will thus run this test with a random framework. While most tests will fit this category, a few exceptions will eventually occur. Tests located in test_common_deeplearning.py for example must always be run with all the frameworks since they check whether each framework specific implementations of ART classifiers produce the exact same outputs.

  • try/except and art_warning: In some cases, framework specific implementations of classifiers or other components needed will not have been implemented yet for a given framework. In order to gracefully move on to the next test, ART tests should be contained within a try/except clause with a art_warning should be created. This will produces a report after the testing completes listing the components implementations currently missing for a given framework.

3. ART Test Conventions

Finally, in addition to using fixtures, the following conventions are used across ART tests:

  1. Test files and tests themselves should not contain any reference to specific frameworks.
  2. Any framework specific test code (eg: if framework == "tensorflow": do this) should be placed in the relevant conftest.py file
  3. An ART-wid random generator master seed is already set within the project root conftest.py file. Hence there is no need to add such master_seed(1234) seeds within test code.

4. Common useful ART fixtures and markers to be aware of

Here is a list of most common ART fixtures available when writing tests. They can be found in any of the pytest conftest.py files within the project

4.1 Dataset fixtures:

Column 1 Column 2
get_mnist_dataset
provides the mnist dataset with the image channel ordered for the relevant framework being used
You Can Also Put Pipes In
  • get_mnist_dataset: provides the mnist dataset with the image channel ordered for the relevant framework being used
  • get_iris_dataset: provides the iris dataset with the image channel ordered for the relevant framework being used
  • get_default_mnist_subset: provides a smaller mnist dataset
  • image_data_generator: provides the mnist dataset as a data generator
  • mnist_shape: provides the shape of the mnist dataset based on where the channel is positioned

4.2 Component fixtures:

  • get_image_classifier_list: provides a list of image classifiers corresponding to the framework the test is being run with In most cases only 1 classifier is returned, however for frameworks such as scikitlearn multiple classifiers are returned
  • get_tabular_classifier_list: provides a list of tabular classifiers corresponding to the framework the test is being run with In most cases only 1 classifier is returned, however for frameworks such as scikitlearn multiple classifiers are returned
  • get_image_classifier_list_defended: provides a defended version of the classifiers returned by get_image_classifier_list
  • get_image_classifier_list_for_attack: provides a list of image classifiers for the framework the test is being run with usable to perform a specific attack

4.3 Test Util fixtures:

  • create_test_dir: creates a temporary test directory
  • create_test_image: provides a test image
  • store_expected_values: stores any large value needed for a test in a json file. The expected_values can be use thereafter to retrieves these values
  • expected_values: Retrieves values expected for a given test, previously stored using the store_expected_values. This fixture identifies whether a value needed for this test should take into account what framework the tests is being run with or not.

4.4 ART markers:

  • @pytest.mark.framework_agnostic: indicates that, although the test can be run successfully in any framework, it does not dependent upon any framework specific implementations. Hence there is no need to run the same test across all frameworks, only one random framework will suffice.

  • @pytest.mark.skipMlFramework("tensorflow","scikitlearn"): indicates that a test currently fails when ran using a specific to be skipped for specific mlFramework values. Valid values are ("tensorflow" "keras" "pytorch" "scikitlearn")

  • DEPRECATED @pytest.mark.only_with_platform("keras"): This marker is deprecated and should only be used for legacy tests that are not yet framework agnostic.

5. Testing code with Expected Values

At times, tests require to assert that a given component produces an expected value. Such expected values can at times be numerous and consist of very large arrays which make the test code unnecessary convoluted and much harder to read. ART provides two helper fixtures which cache any expected values required and thus makes your test code much more readable and small.

While writing your test, the first version of your test using hardcoded expected values can use the store_expected_values fixture in order to cache such values as follows:

@pytest.mark.framework_agnostic
def test_myTest(get_default_mnist_subset, get_image_classifier_list, store_expected_values):
    try:
       (x_train_mnist, y_train_mnist), (x_test_mnist, y_test_mnist) = get_default_mnist_subset
    
       classifier, sess = get_image_classifier_list(one_classifier=True)
        
       expected_value1 = np.asarray(
            [
                0.0000000e00,
                0.0000000e00,
                0.0000000e00,
                2.3582461e-03,
                4.8802234e-04,
                1.6699843e-03,
                -6.4777887e-05,
                -1.4215634e-03,
                -1.3359448e-04,
                2.0448549e-03,
                2.8171093e-04,
                1.9665064e-04,
                1.5335126e-03,
                1.7000455e-03,
                -2.0136381e-04,
                6.4588618e-04,
                2.0524357e-03,
                2.1990810e-03,
                8.3692279e-04,
                0.0000000e00,
                0.0000000e00,
                0.0000000e00,
                0.0000000e00,
                0.0000000e00,
                0.0000000e00,
                0.0000000e00,
                0.0000000e00,
                0.0000000e00,
            ]
        )
    
        # ... more expected value arrays 
    
        # example test code
        labels = np.argmax(y_test_mnist, axis=1)
        accuracy_2 = np.sum(np.argmax(classifier.predict(x_test_mnist), axis=1) == labels) / x_test_mnist.shape[0]
        assert accuracy_2 == expected_value1
    
        store_expected_values(expected_value1, expected_value2, ...)
    except NotImplementedError as e:
        warnings.warn(UserWarning(e))

Once the expected values have been cached, the final version of the test can be increased in readability and simplicity by using the expected_values fixture as follows:

@pytest.mark.framework_agnostic
def test_myTest(get_default_mnist_subset, get_image_classifier_list, expected_values):
    try:
        (x_train_mnist, y_train_mnist), (x_test_mnist, y_test_mnist) = get_default_mnist_subset
    
        # this test is being run with
        classifier, sess = get_image_classifier_list(one_classifier=True)
    
        #retrieve the cached expected values
        (expected_value1, expected_value2, ...) = expected_values
    
        # example test code
        labels = np.argmax(y_test_mnist, axis=1)
        accuracy_2 = np.sum(np.argmax(classifier.predict(x_test_mnist), axis=1) == labels) / x_test_mnist.shape[0]
        assert accuracy_2 == expected_value1
    except NotImplementedError as e:
        warnings.warn(UserWarning(e))
Clone this wiki locally