Skip to content

Commit 4222e32

Browse files
New short section on AI/ML (ossf#91)
* New short section on AI/ML People will use AI & ML whether or not we say anything, so let's say something. There's a lot we can say, here is what I think are the basics. * Include fixes for problems noted by Arnaud J Le Hors (@lehors) * Credit Alstott for the simple ML issue description * Minor punctuation fixes Signed-off-by: David A. Wheeler <[email protected]>
1 parent a9f8584 commit 4222e32

File tree

1 file changed

+130
-2
lines changed

1 file changed

+130
-2
lines changed

secure_software_development_fundamentals.md

Lines changed: 130 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4976,6 +4976,112 @@ When disposing, make sure you fully destroy any data you are supposed to destroy
49764976

49774977
[ ] Fix any security issue rapidly, and then just move on to other problems. {{ selected: No, after you fix a security issue (incident), you should also try to find out *why* it happened (a “root cause analysis”) so you can fix the underlying cause. Otherwise, there is a good chance that similar problems will keep happening. }}
49784978

4979+
### Artificial Intelligence (AI), Machine Learning (ML), and Security
4980+
4981+
Artificial intelligence (AI) is intelligence demonstrated by machines
4982+
(intelligence of humans and animals is sometimes called natural intelligence).
4983+
Machine learning (ML) is a field of inquiry devoted to
4984+
understanding and building methods that 'learn', that is,
4985+
methods that leverage data to improve performance on some set of tasks
4986+
(*Machine Learning*, Tom Mitchell).
4987+
ML is often considered a subset of AI.
4988+
A significant amount of AI security work today focuses on ML;
4989+
we will take the same focus here.
4990+
4991+
Building ML systems often involve several processes, namely
4992+
training, testing, and inference. Inference is when the ML system is being
4993+
used by its users.
4994+
Many ML projects have assumed a closed and trusted environment where
4995+
there are no security threats.
4996+
However, this assumption is often unrealistic.
4997+
4998+
*Adversarial machine learning* is the set of efforts to
4999+
protect the ML pipeline to ensure its security during training,
5000+
test, and inference.
5001+
This is an active area of study, and terminology varies.
5002+
That said, there are many kinds of potential attacks on ML systems, including:
5003+
5004+
* *Evasion* ("do/infer the wrong thing").
5005+
In an evasion attack, the attacker provides a modified input to
5006+
an ML system's classifier during inference so it's misclassified
5007+
while keeping the modification as small as possible
5008+
(Nicolae et al, 2019).
5009+
For example, an attacker might create subtle markings in a road to
5010+
convince a self-driving car to unexpectedly swerve into oncoming traffic.
5011+
Such modified inputs are sometimes called *adversarial inputs*.
5012+
Adversarial inputs can enable the attacker to control the system depending on
5013+
the classifier.
5014+
Thus, this kind of attack may lead to a loss of integrity and/or availability.
5015+
* *Poisoning* ("learn the wrong thing").
5016+
In a poisoning attack, the attacker manipulate data that will be used as
5017+
training data, e.g., to reduce performance, cause misclassification, and/or
5018+
insert backdoors
5019+
(Nicolae et al, 2019).
5020+
ML systems typically need a large amount of training data;
5021+
some attackers may even create or manipulate publicly-available
5022+
data if it is likely to be eventually used for training.
5023+
This kind of attack may lead to a loss of integrity and/or availability.
5024+
* *Loss of confidentiality* ("reveal the wrong thing").
5025+
An attacker may be able to use query results to reveal hidden information.
5026+
Thus, this kind of attack may lead to a loss of confidentiality.
5027+
This kind of attack can be subdivided further, for example:
5028+
* *Extraction*.
5029+
In an extraction attack, the attacker extracts the parameters or
5030+
structure of the model from observations of the model’s predictions
5031+
(Tabassi 2019).
5032+
* *(Membership) inference*.
5033+
In a membership inference attack, the attacker
5034+
uses target model query results to determine if specific
5035+
data points belong to the same distribution as the training dataset
5036+
(Tabassi 2019).
5037+
* *(Model) inversion*.
5038+
In an inversion attack, the attacker is able to
5039+
reconstruct (some) data used to train the model, including
5040+
private and/or secret data (Tabassi 2019).
5041+
5042+
(Credit: The simple descriptions shown above in parentheses and double-quotes
5043+
were coined by Dr. Jeff Alstott.)
5044+
5045+
Work has especially focused on countering evasion
5046+
(adversarial inputs) in ML systems.
5047+
Unfortunately, many approaches that *appear* to counter evasion fail to
5048+
counter non-naïve attackers.
5049+
Here are some example approaches that don't counter determined attackers:
5050+
5051+
* *Adversarial training* creates adversarial inputs, then trains the
5052+
model on those inputs. This can improve robustness, but an attacker can
5053+
simply repeat this process more often than the defender.
5054+
* *Null labeling* attempts to train a model that certain inputs are likely
5055+
adversarial (and should be classified as "null" results).
5056+
Again, this appears to be weak against determined adversaries, as explained
5057+
by Carlini and Wagner
5058+
(“Adversarial Examples Are Not Easily Detected:
5059+
Bypassing Ten Detection Methods” by Nicholas Carlini & David Wagner, 2017.)
5060+
5061+
One tool that may be helpful is the
5062+
Adversarial Robustness Toolbox (ART)
5063+
<https://github.com/Trusted-AI/adversarial-robustness-toolbox/wiki/>.
5064+
The post
5065+
[Integrate adversarial attacks in a model training pipeline](https://developer.ibm.com/patterns/integrate-adversarial-attacks-model-training-pipeline/),
5066+
by Singh et al,
5067+
provides an example of how ART can be integrated into a larger pipeline.
5068+
However, before using any tool you need to determine if it's effective enough
5069+
for your circumstances.
5070+
5071+
Adversarial ML is an active research area.
5072+
Before using countermeasures,
5073+
determine if the countermeasures will be adequate for your purposes.
5074+
Many countermeasures only work against naive attackers who do not
5075+
compensate for countermeasures.
5076+
Depending on your purposes,
5077+
there may not be *any* countermeasure that adequately counters attackers
5078+
with adequate confidence.
5079+
*Many* countermeasures have been proposed and later found to be inadequate.
5080+
One paper that discusses how to evaluate countermeasures is by
5081+
[Nicholas Carlini, Anish Athlye, Nicolas Papernot, et al., “On Evaluating Adversarial Robustness”, 2019-02-20](https://arxiv.org/pdf/1902.06705).
5082+
We hope that in the future there will be better countermeasures with
5083+
more industry-wide confidence.
5084+
49795085
### Formal Methods
49805086

49815087
Today most software needs to be developed to be “reasonably” or “adequately” secure. This course has focused on techniques to help you do that. However, if it is *extremely critical* that your software meet some criteria - such as some security criteria - there is an additional approach that you should be aware of: *formal methods*.
@@ -5798,6 +5904,11 @@ Butler, Ricky W., *What is Formal Methods?* ([https://shemesh.larc.nasa.gov/fm/f
57985904

57995905
C FAQ list ([http://c-faq.com/ansi/undef.html](http://c-faq.com/ansi/undef.html))
58005906

5907+
Carlini, Nicholas, & David Wagner, 2017,
5908+
"Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods”
5909+
5910+
Carlini, Nicholas Anish Athlye, Nicolas Papernot, et al., “On Evaluating Adversarial Robustness”, 2019-02-20, <https://arxiv.org/pdf/1902.06705>.
5911+
58015912
Carnegie Mellon University: Software Engineering Institute, CERT Division ([https://sei.cmu.edu/about/divisions/cert/index.cfm](https://sei.cmu.edu/about/divisions/cert/index.cfm))
58025913

58035914
Chen, Raymond, *Undefined behavior can result in time travel (among other things, but time travel is the funkiest)*, 2014-06-27, ([https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633](https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633))
@@ -5905,6 +6016,8 @@ Microsoft, “3 Ways to Mitigate Risk When Using Private Package Feeds”, (<ht
59056016

59066017
Minocha, Shreyas, *Regular Expressions for Regular Folk* ([https://refrf.shreyasminocha.me/](https://refrf.shreyasminocha.me/))
59076018

6019+
Mitchell, Tom, 1997, *Machine Learning*. New York: McGraw Hill. ISBN 0-07-042807-7. OCLC 36417892.
6020+
59086021
MITRE ([https://www.mitre.org/](https://www.mitre.org/))
59096022

59106023
MITRE, Common Weakness Enumeration (CWE) ([https://cwe.mitre.org/](https://cwe.mitre.org/))
@@ -5921,12 +6034,17 @@ Mozilla, Rust vs. C++ in macOS Firefox Nightly ([https://docs.google.com/spreads
59216034

59226035
Mozilla, *Same-Origin Policy* ([https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy](https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy))
59236036

5924-
Newcombe, Chris; Rath, Tim; Zhang, Fan; Munteanu, Bogdan; Brooker, Marc; Daerdeuff, Michael, *Use of Formal Methods at Amazon Web Services*, 2014-09-29 ([https://lamport.azurewebsites.net/tla/formal-methods-amazon.pdf](https://lamport.azurewebsites.net/tla/formal-methods-amazon.pdf))
5925-
59266037
National Vulnerability Database (NVD), CVE-2021-44228, (<https://nvd.nist.gov/vuln/detail/CVE-2021-44228>)
59276038

6039+
Newcombe, Chris; Rath, Tim; Zhang, Fan; Munteanu, Bogdan; Brooker, Marc; Daerdeuff, Michael, *Use of Formal Methods at Amazon Web Services*, 2014-09-29 ([https://lamport.azurewebsites.net/tla/formal-methods-amazon.pdf](https://lamport.azurewebsites.net/tla/formal-methods-amazon.pdf))
6040+
59286041
Newcombe, Chris; Rath, Tim; Zhang, Fan; Munteanu, Bogdan; Brooker, Marc; Daerdeuff, Michael, *How Amazon Web Services Uses Formal Methods*, Communications of the ACM, Vol. 58 No. 4, Pages 66-73, 10.1145/2699417, 2015-04 ([https://cacm.acm.org/magazines/2015/4/184701-how-amazon-web-services-uses-formal-methods/fulltext](https://cacm.acm.org/magazines/2015/4/184701-how-amazon-web-services-uses-formal-methods/fulltext))
59296042

6043+
Nicolae, Maria-Irina, Mathieu Sinn, Minh Ngoc Tran, Beat Buesser, Ambrish Rawat, Martin Wistuba, Valentina Zantedeschi, Nathalie Baracaldo, Bryant Chen, Heiko Ludwig, Ian M. Molloy, Ben Edwards,
6044+
Adversarial Robustness Toolbox v1.0.0
6045+
2019-11-15
6046+
<https://arxiv.org/abs/1807.01069>
6047+
59306048
Official EU site for the GDPR text ([https://eur-lex.europa.eu/eli/reg/2016/679/oj](https://eur-lex.europa.eu/eli/reg/2016/679/oj))
59316049

59326050
Ohm, Marc; Plate, Henrik; Sykosch, Arnold; Meier, Michal, *Backstabber’s Knife Collection: A Review of Open Source Software Supply Chain Attacks*, 2020-05-19 ([https://arxiv.org/abs/2005.09535](https://arxiv.org/abs/2005.09535))
@@ -6015,6 +6133,10 @@ Shu, Xiaokui; Ciambrone, Andrew; Yao, Danfeng, *Breaking the Target: An Analysis
60156133

60166134
Sim, Darren, *Security Vulnerability and Browser Performance Impact of Target="_blank”*, 2019-03-23 ([https://medium.com/@darrensimio/security-vulnerability-and-browser-performance-impact-of-target-blank-80e5e67db547](https://medium.com/@darrensimio/security-vulnerability-and-browser-performance-impact-of-target-blank-80e5e67db547))
60176135

6136+
Singh, Animesh, Anupama Murthy, and Christian Kadner,
6137+
[Integrate adversarial attacks in a model training pipeline](https://developer.ibm.com/patterns/integrate-adversarial-attacks-model-training-pipeline/),
6138+
2018-06-25
6139+
60186140
SSD Disclosure, SSD Advisory – VestaCP LPE Vulnerabilities, 2021-03-20, (<https://ssd-disclosure.com/ssd-advisory-vestacp-lpe-vulnerabilities/>)
60196141

60206142
State of California, *California Online Privacy Protection Act (OPPA)*, 2003 ([https://leginfo.legislature.ca.gov/faces/codes_displaySection.xhtml?lawCode=BPC&sectionNum=22575](https://leginfo.legislature.ca.gov/faces/codes_displaySection.xhtml?lawCode=BPC&sectionNum=22575))
@@ -6027,6 +6149,12 @@ Stilgherrian, *Relying on bug bounties ‘not appropriate risk management’: Ka
60276149

60286150
Swift, *Optional Chaining* ([https://docs.swift.org/swift-book/LanguageGuide/OptionalChaining.html](https://docs.swift.org/swift-book/LanguageGuide/OptionalChaining.html))
60296151

6152+
Tabassi, Elham (NIST), Kevin Burns (MITRE), Michael Hadjimichael (MITRE), Andres Molina-Markham (MITRE), Julian Sexton (MITRE),
6153+
A Taxonomy and Terminology of Adversarial Machine Learning
6154+
NISTIR 8269 (Draft),
6155+
October 2019
6156+
https://csrc.nist.gov/publications/detail/nistir/8269/draft
6157+
60306158
The Fuzzing Project ([https://fuzzing-project.org/](https://fuzzing-project.org/))
60316159

60326160
The Linux Foundation, *Summary of GDPR Concepts For Free and Open Source Software Projects*, 2018 ([https://www.linuxfoundation.org/wp-content/uploads/2018/05/lf_gdpr_052418.pdf](https://www.linuxfoundation.org/wp-content/uploads/2018/05/lf_gdpr_052418.pdf))

0 commit comments

Comments
 (0)