Skip to content

Add OnePixelShortcutAttack poisoning attack and its unit tests #2720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: dev_1.21.0
Choose a base branch
from

Conversation

nicholasadriel
Copy link

@nicholasadriel nicholasadriel commented Aug 9, 2025

Description

This PR adds a new data poisoning attack class, OnePixelShortcutAttack, to the Adversarial Robustness Toolbox. The class is implemented under the art.attacks.poisoning module, and it introduces support for the One Pixel Shortcut (OPS) attack in ART. A corresponding unit test suite (test_one_pixel_shortcut_attack.py) is also included to validate the correct behavior of the attack implementation.

Motivation

The One Pixel Shortcut attack is a recently proposed poisoning technique that perturbs a single pixel in each training image (in a consistent location per class) to create "unlearnable" examples. This can dramatically degrade a model’s accuracy on clean data without altering the labels. By integrating OPS into IBM ART, we enable standardized evaluation of this attack using ART’s framework and estimators. The implementation has been tested on and extends support to popular image classification datasets such as CIFAR-10, CIFAR-100, UTKFace, and CelebA, ensuring the attack’s broad applicability. Incorporating OPS aligns with ART’s benchmarking and reproducibility goals, expanding the library’s coverage of state-of-the-art poisoning attacks.

Fixes

No open issue is associated with this PR (new feature contribution).

Type of change

New feature (non-breaking change which adds functionality)

Testing

Unit tests have been added in test_one_pixel_shortcut_attack.py to verify the implementation’s correctness:

  • Output shape: Ensures the poisoned data produced by OnePixelShortcutAttack has the same shape as the original input data (no unintended dimensionality changes).
  • Label preservation: Confirms that the attack does not alter the class labels of the dataset (the labels remain unchanged after poisoning).
  • Per-class pixel perturbation: Verifies that exactly one pixel per class is consistently perturbed across all images of that class, validating the intended one-pixel shortcut behavior.

All tests pass, confirming that the attack behaves as expected and integrates correctly with ART’s data and estimator APIs.

Test Configuration

No additional configuration or dependencies are required for this feature. The OnePixelShortcutAttack can be used out-of-the-box with ART’s existing classifiers and datasets, similar to other poisoning attacks in the library.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • My changes have been tested using both CPU and GPU devices

Reference

Shutong Wu, Sizhe Chen, Cihang Xie, and Xiaolin Huang. One-pixel shortcut: On the learning preference of deep neural networks. In Proc. of ICLR 2023

@nicholasadriel nicholasadriel force-pushed the one-pixel-shortcut-attack branch from fbed197 to 17257dc Compare August 9, 2025 04:08
@beat-buesser beat-buesser changed the base branch from main to dev_1.21.0 August 11, 2025 12:22
@beat-buesser beat-buesser changed the base branch from dev_1.21.0 to main August 11, 2025 12:24
@beat-buesser beat-buesser changed the base branch from main to dev_1.21.0 August 11, 2025 12:25
Copy link

codecov bot commented Aug 11, 2025

Codecov Report

❌ Patch coverage is 98.64865% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 85.23%. Comparing base (cbc158b) to head (4be2af1).

Files with missing lines Patch % Lines
art/attacks/poisoning/one_pixel_shortcut_attack.py 98.63% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff               @@
##           dev_1.21.0    #2720      +/-   ##
==============================================
- Coverage       85.23%   85.23%   -0.01%     
==============================================
  Files             330      331       +1     
  Lines           29820    29894      +74     
  Branches         5007     5023      +16     
==============================================
+ Hits            25418    25480      +62     
- Misses           2970     2980      +10     
- Partials         1432     1434       +2     
Files with missing lines Coverage Δ
art/attacks/poisoning/__init__.py 100.00% <100.00%> (ø)
art/attacks/poisoning/one_pixel_shortcut_attack.py 98.63% <98.63%> (ø)

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@beat-buesser beat-buesser self-requested a review August 11, 2025 14:48
@Copilot Copilot AI review requested due to automatic review settings August 12, 2025 00:34
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new data poisoning attack implementation called OnePixelShortcutAttack to the Adversarial Robustness Toolbox (ART). The attack perturbs a single pixel in each training image at a consistent location per class to create "unlearnable" examples that degrade model accuracy on clean data.

  • Implementation of the One Pixel Shortcut (OPS) attack as a new poisoning attack class
  • Comprehensive unit test suite validating attack behavior and integration with ART estimators
  • Updates to package dependencies and CI configurations

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
art/attacks/poisoning/one_pixel_shortcut_attack.py Core implementation of the OnePixelShortcutAttack class with pixel perturbation logic
tests/attacks/poison/test_one_pixel_shortcut_attack.py Comprehensive unit tests covering various scenarios and edge cases
art/attacks/poisoning/init.py Adds import for the new attack class
requirements_test.txt Updates dependency versions for testing infrastructure
.github/workflows/dockerhub.yml Updates Docker action versions
.github/workflows/ci-huggingface.yml Adds safetensors dependency and updates filtering logic

acc_poisoned = np.mean(preds_poisoned.argmax(axis=1) == y_poison)

# Adjusted assertions for robustness
assert acc_poisoned >= 1.0, f"Expected 100% poisoned accuracy, got {acc_poisoned:.3f}"
Copy link
Preview

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion checks for accuracy >= 1.0, but accuracy values are typically in the range [0, 1]. This comparison should be >= 1.0 or == 1.0 if expecting perfect accuracy, but the current logic appears incorrect since accuracy cannot exceed 1.0.

Suggested change
assert acc_poisoned >= 1.0, f"Expected 100% poisoned accuracy, got {acc_poisoned:.3f}"
assert np.isclose(acc_poisoned, 1.0), f"Expected 100% poisoned accuracy, got {acc_poisoned:.3f}"

Copilot uses AI. Check for mistakes.

max_idx_flat = np.argmax(score_map)
max_score = score_map.ravel()[max_idx_flat]
if max_score > best_score:
best_score = float(max_score)
Copy link
Preview

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explicit conversion to float() is unnecessary since max_score is already a float from the numpy operations. This conversion can be removed for cleaner code.

Suggested change
best_score = float(max_score)
best_score = max_score

Copilot uses AI. Check for mistakes.

np.array([0.0], dtype=x_orig.dtype),
np.array([1.0], dtype=x_orig.dtype),
]
else:
Copy link
Preview

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For high-channel images (e.g., c=4), this generates 2^c combinations (16 for RGBA). This could be expensive for larger channel counts. Consider documenting this limitation or adding a reasonable upper bound on channel count.

Suggested change
else:
else:
if c > MAX_CHANNELS_FOR_COMBINATIONS:
raise ValueError(
f"Number of channels (c={c}) is too high for exhaustive search "
f"(max allowed: {MAX_CHANNELS_FOR_COMBINATIONS}). "
"Reduce the number of channels or modify the attack implementation."
)

Copilot uses AI. Check for mistakes.

@nicholasadriel
Copy link
Author

Update:

  1. Worked on the Codecov patch coverage issue (previously 80.82192% with 14 lines missing) by extending unit tests to hit the remaining branches (shape routing for NHWC/NCHW, one‑hot label handling, empty‑class skip, and best‑coord guard).
  2. Ran Black and fixed pycodestyle/mypy findings to address the Style Check.

Kindly re‑run CI and re‑review, happy to adjust further if needed. Thank you @beat-buesser

@beat-buesser beat-buesser self-assigned this Aug 12, 2025
@nicholasadriel
Copy link
Author

Hi @beat-buesser may I know why there is still 1 pending check regarding PyTorch 2.6.0 (Python 3.10) (Expected — Waiting for status to be reported) ?

I think from the last check, the only issue was the CI Style Checks but now it is successful. The Codecov part also passed this time with higher percentage of patch coverage.

Please let me know if there is something I need to further adjust, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants