-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add OnePixelShortcutAttack poisoning attack and its unit tests #2720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev_1.21.0
Are you sure you want to change the base?
Add OnePixelShortcutAttack poisoning attack and its unit tests #2720
Conversation
Signed-off-by: Nicholas Audric Adriel <[email protected]>
fbed197
to
17257dc
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev_1.21.0 #2720 +/- ##
==============================================
- Coverage 85.23% 85.23% -0.01%
==============================================
Files 330 331 +1
Lines 29820 29894 +74
Branches 5007 5023 +16
==============================================
+ Hits 25418 25480 +62
- Misses 2970 2980 +10
- Partials 1432 1434 +2
🚀 New features to boost your workflow:
|
Signed-off-by: Nicholas Audric Adriel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new data poisoning attack implementation called OnePixelShortcutAttack
to the Adversarial Robustness Toolbox (ART). The attack perturbs a single pixel in each training image at a consistent location per class to create "unlearnable" examples that degrade model accuracy on clean data.
- Implementation of the One Pixel Shortcut (OPS) attack as a new poisoning attack class
- Comprehensive unit test suite validating attack behavior and integration with ART estimators
- Updates to package dependencies and CI configurations
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
art/attacks/poisoning/one_pixel_shortcut_attack.py | Core implementation of the OnePixelShortcutAttack class with pixel perturbation logic |
tests/attacks/poison/test_one_pixel_shortcut_attack.py | Comprehensive unit tests covering various scenarios and edge cases |
art/attacks/poisoning/init.py | Adds import for the new attack class |
requirements_test.txt | Updates dependency versions for testing infrastructure |
.github/workflows/dockerhub.yml | Updates Docker action versions |
.github/workflows/ci-huggingface.yml | Adds safetensors dependency and updates filtering logic |
acc_poisoned = np.mean(preds_poisoned.argmax(axis=1) == y_poison) | ||
|
||
# Adjusted assertions for robustness | ||
assert acc_poisoned >= 1.0, f"Expected 100% poisoned accuracy, got {acc_poisoned:.3f}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion checks for accuracy >= 1.0, but accuracy values are typically in the range [0, 1]. This comparison should be >= 1.0
or == 1.0
if expecting perfect accuracy, but the current logic appears incorrect since accuracy cannot exceed 1.0.
assert acc_poisoned >= 1.0, f"Expected 100% poisoned accuracy, got {acc_poisoned:.3f}" | |
assert np.isclose(acc_poisoned, 1.0), f"Expected 100% poisoned accuracy, got {acc_poisoned:.3f}" |
Copilot uses AI. Check for mistakes.
max_idx_flat = np.argmax(score_map) | ||
max_score = score_map.ravel()[max_idx_flat] | ||
if max_score > best_score: | ||
best_score = float(max_score) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The explicit conversion to float()
is unnecessary since max_score
is already a float from the numpy operations. This conversion can be removed for cleaner code.
best_score = float(max_score) | |
best_score = max_score |
Copilot uses AI. Check for mistakes.
np.array([0.0], dtype=x_orig.dtype), | ||
np.array([1.0], dtype=x_orig.dtype), | ||
] | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For high-channel images (e.g., c=4), this generates 2^c combinations (16 for RGBA). This could be expensive for larger channel counts. Consider documenting this limitation or adding a reasonable upper bound on channel count.
else: | |
else: | |
if c > MAX_CHANNELS_FOR_COMBINATIONS: | |
raise ValueError( | |
f"Number of channels (c={c}) is too high for exhaustive search " | |
f"(max allowed: {MAX_CHANNELS_FOR_COMBINATIONS}). " | |
"Reduce the number of channels or modify the attack implementation." | |
) |
Copilot uses AI. Check for mistakes.
Update:
Kindly re‑run CI and re‑review, happy to adjust further if needed. Thank you @beat-buesser |
Hi @beat-buesser may I know why there is still 1 pending check regarding PyTorch 2.6.0 (Python 3.10) (Expected — Waiting for status to be reported) ? I think from the last check, the only issue was the CI Style Checks but now it is successful. The Codecov part also passed this time with higher percentage of patch coverage. Please let me know if there is something I need to further adjust, thank you! |
Description
This PR adds a new data poisoning attack class, OnePixelShortcutAttack, to the Adversarial Robustness Toolbox. The class is implemented under the art.attacks.poisoning module, and it introduces support for the One Pixel Shortcut (OPS) attack in ART. A corresponding unit test suite (test_one_pixel_shortcut_attack.py) is also included to validate the correct behavior of the attack implementation.
Motivation
The One Pixel Shortcut attack is a recently proposed poisoning technique that perturbs a single pixel in each training image (in a consistent location per class) to create "unlearnable" examples. This can dramatically degrade a model’s accuracy on clean data without altering the labels. By integrating OPS into IBM ART, we enable standardized evaluation of this attack using ART’s framework and estimators. The implementation has been tested on and extends support to popular image classification datasets such as CIFAR-10, CIFAR-100, UTKFace, and CelebA, ensuring the attack’s broad applicability. Incorporating OPS aligns with ART’s benchmarking and reproducibility goals, expanding the library’s coverage of state-of-the-art poisoning attacks.
Fixes
No open issue is associated with this PR (new feature contribution).
Type of change
New feature (non-breaking change which adds functionality)
Testing
Unit tests have been added in test_one_pixel_shortcut_attack.py to verify the implementation’s correctness:
All tests pass, confirming that the attack behaves as expected and integrates correctly with ART’s data and estimator APIs.
Test Configuration
No additional configuration or dependencies are required for this feature. The OnePixelShortcutAttack can be used out-of-the-box with ART’s existing classifiers and datasets, similar to other poisoning attacks in the library.
Checklist
Reference
Shutong Wu, Sizhe Chen, Cihang Xie, and Xiaolin Huang. One-pixel shortcut: On the learning preference of deep neural networks. In Proc. of ICLR 2023