You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* New short section on AI/ML
People will use AI & ML whether or not we say anything, so let's
say something. There's a lot we can say, here is what I think are
the basics.
* Include fixes for problems noted by
Arnaud J Le Hors (@lehors)
* Credit Alstott for the simple ML issue description
* Minor punctuation fixes
Signed-off-by: David A. Wheeler <[email protected]>
Copy file name to clipboardExpand all lines: secure_software_development_fundamentals.md
+130-2Lines changed: 130 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4976,6 +4976,112 @@ When disposing, make sure you fully destroy any data you are supposed to destroy
4976
4976
4977
4977
[ ] Fix any security issue rapidly, and then just move on to other problems. {{ selected: No, after you fix a security issue (incident), you should also try to find out *why* it happened (a “root cause analysis”) so you can fix the underlying cause. Otherwise, there is a good chance that similar problems will keep happening. }}
4978
4978
4979
+
### Artificial Intelligence (AI), Machine Learning (ML), and Security
4980
+
4981
+
Artificial intelligence (AI) is intelligence demonstrated by machines
4982
+
(intelligence of humans and animals is sometimes called natural intelligence).
4983
+
Machine learning (ML) is a field of inquiry devoted to
4984
+
understanding and building methods that 'learn', that is,
4985
+
methods that leverage data to improve performance on some set of tasks
4986
+
(*Machine Learning*, Tom Mitchell).
4987
+
ML is often considered a subset of AI.
4988
+
A significant amount of AI security work today focuses on ML;
4989
+
we will take the same focus here.
4990
+
4991
+
Building ML systems often involve several processes, namely
4992
+
training, testing, and inference. Inference is when the ML system is being
4993
+
used by its users.
4994
+
Many ML projects have assumed a closed and trusted environment where
4995
+
there are no security threats.
4996
+
However, this assumption is often unrealistic.
4997
+
4998
+
*Adversarial machine learning* is the set of efforts to
4999
+
protect the ML pipeline to ensure its security during training,
5000
+
test, and inference.
5001
+
This is an active area of study, and terminology varies.
5002
+
That said, there are many kinds of potential attacks on ML systems, including:
5003
+
5004
+
* *Evasion* ("do/infer the wrong thing").
5005
+
In an evasion attack, the attacker provides a modified input to
5006
+
an ML system's classifier during inference so it's misclassified
5007
+
while keeping the modification as small as possible
5008
+
(Nicolae et al, 2019).
5009
+
For example, an attacker might create subtle markings in a road to
5010
+
convince a self-driving car to unexpectedly swerve into oncoming traffic.
5011
+
Such modified inputs are sometimes called *adversarial inputs*.
5012
+
Adversarial inputs can enable the attacker to control the system depending on
5013
+
the classifier.
5014
+
Thus, this kind of attack may lead to a loss of integrity and/or availability.
5015
+
* *Poisoning* ("learn the wrong thing").
5016
+
In a poisoning attack, the attacker manipulate data that will be used as
5017
+
training data, e.g., to reduce performance, cause misclassification, and/or
5018
+
insert backdoors
5019
+
(Nicolae et al, 2019).
5020
+
ML systems typically need a large amount of training data;
5021
+
some attackers may even create or manipulate publicly-available
5022
+
data if it is likely to be eventually used for training.
5023
+
This kind of attack may lead to a loss of integrity and/or availability.
5024
+
* *Loss of confidentiality* ("reveal the wrong thing").
5025
+
An attacker may be able to use query results to reveal hidden information.
5026
+
Thus, this kind of attack may lead to a loss of confidentiality.
5027
+
This kind of attack can be subdivided further, for example:
5028
+
* *Extraction*.
5029
+
In an extraction attack, the attacker extracts the parameters or
5030
+
structure of the model from observations of the model’s predictions
5031
+
(Tabassi 2019).
5032
+
* *(Membership) inference*.
5033
+
In a membership inference attack, the attacker
5034
+
uses target model query results to determine if specific
5035
+
data points belong to the same distribution as the training dataset
5036
+
(Tabassi 2019).
5037
+
* *(Model) inversion*.
5038
+
In an inversion attack, the attacker is able to
5039
+
reconstruct (some) data used to train the model, including
5040
+
private and/or secret data (Tabassi 2019).
5041
+
5042
+
(Credit: The simple descriptions shown above in parentheses and double-quotes
5043
+
were coined by Dr. Jeff Alstott.)
5044
+
5045
+
Work has especially focused on countering evasion
5046
+
(adversarial inputs) in ML systems.
5047
+
Unfortunately, many approaches that *appear* to counter evasion fail to
5048
+
counter non-naïve attackers.
5049
+
Here are some example approaches that don't counter determined attackers:
5050
+
5051
+
* *Adversarial training* creates adversarial inputs, then trains the
5052
+
model on those inputs. This can improve robustness, but an attacker can
5053
+
simply repeat this process more often than the defender.
5054
+
* *Null labeling* attempts to train a model that certain inputs are likely
5055
+
adversarial (and should be classified as "null" results).
5056
+
Again, this appears to be weak against determined adversaries, as explained
5057
+
by Carlini and Wagner
5058
+
(“Adversarial Examples Are Not Easily Detected:
5059
+
Bypassing Ten Detection Methods” by Nicholas Carlini & David Wagner, 2017.)
[Integrate adversarial attacks in a model training pipeline](https://developer.ibm.com/patterns/integrate-adversarial-attacks-model-training-pipeline/),
5066
+
by Singh et al,
5067
+
provides an example of how ART can be integrated into a larger pipeline.
5068
+
However, before using any tool you need to determine if it's effective enough
5069
+
for your circumstances.
5070
+
5071
+
Adversarial ML is an active research area.
5072
+
Before using countermeasures,
5073
+
determine if the countermeasures will be adequate for your purposes.
5074
+
Many countermeasures only work against naive attackers who do not
5075
+
compensate for countermeasures.
5076
+
Depending on your purposes,
5077
+
there may not be *any* countermeasure that adequately counters attackers
5078
+
with adequate confidence.
5079
+
*Many* countermeasures have been proposed and later found to be inadequate.
5080
+
One paper that discusses how to evaluate countermeasures is by
5081
+
[Nicholas Carlini, Anish Athlye, Nicolas Papernot, et al., “On Evaluating Adversarial Robustness”, 2019-02-20](https://arxiv.org/pdf/1902.06705).
5082
+
We hope that in the future there will be better countermeasures with
5083
+
more industry-wide confidence.
5084
+
4979
5085
### Formal Methods
4980
5086
4981
5087
Today most software needs to be developed to be “reasonably” or “adequately” secure. This course has focused on techniques to help you do that. However, if it is *extremely critical* that your software meet some criteria - such as some security criteria - there is an additional approach that you should be aware of: *formal methods*.
@@ -5798,6 +5904,11 @@ Butler, Ricky W., *What is Formal Methods?* ([https://shemesh.larc.nasa.gov/fm/f
5798
5904
5799
5905
C FAQ list ([http://c-faq.com/ansi/undef.html](http://c-faq.com/ansi/undef.html))
5800
5906
5907
+
Carlini, Nicholas, & David Wagner, 2017,
5908
+
"Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods”
5909
+
5910
+
Carlini, Nicholas Anish Athlye, Nicolas Papernot, et al., “On Evaluating Adversarial Robustness”, 2019-02-20, <https://arxiv.org/pdf/1902.06705>.
5911
+
5801
5912
Carnegie Mellon University: Software Engineering Institute, CERT Division ([https://sei.cmu.edu/about/divisions/cert/index.cfm](https://sei.cmu.edu/about/divisions/cert/index.cfm))
5802
5913
5803
5914
Chen, Raymond, *Undefined behavior can result in time travel (among other things, but time travel is the funkiest)*, 2014-06-27, ([https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633](https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633))
@@ -5905,6 +6016,8 @@ Microsoft, “3 Ways to Mitigate Risk When Using Private Package Feeds”, (<ht
5905
6016
5906
6017
Minocha, Shreyas, *Regular Expressions for Regular Folk* ([https://refrf.shreyasminocha.me/](https://refrf.shreyasminocha.me/))
5907
6018
6019
+
Mitchell, Tom, 1997, *Machine Learning*. New York: McGraw Hill. ISBN 0-07-042807-7. OCLC 36417892.
Nicolae, Maria-Irina, Mathieu Sinn, Minh Ngoc Tran, Beat Buesser, Ambrish Rawat, Martin Wistuba, Valentina Zantedeschi, Nathalie Baracaldo, Bryant Chen, Heiko Ludwig, Ian M. Molloy, Ben Edwards,
6044
+
Adversarial Robustness Toolbox v1.0.0
6045
+
2019-11-15
6046
+
<https://arxiv.org/abs/1807.01069>
6047
+
5930
6048
Official EU site for the GDPR text ([https://eur-lex.europa.eu/eli/reg/2016/679/oj](https://eur-lex.europa.eu/eli/reg/2016/679/oj))
5931
6049
5932
6050
Ohm, Marc; Plate, Henrik; Sykosch, Arnold; Meier, Michal, *Backstabber’s Knife Collection: A Review of Open Source Software Supply Chain Attacks*, 2020-05-19 ([https://arxiv.org/abs/2005.09535](https://arxiv.org/abs/2005.09535))
@@ -6015,6 +6133,10 @@ Shu, Xiaokui; Ciambrone, Andrew; Yao, Danfeng, *Breaking the Target: An Analysis
6015
6133
6016
6134
Sim, Darren, *Security Vulnerability and Browser Performance Impact of Target="_blank”*, 2019-03-23 ([https://medium.com/@darrensimio/security-vulnerability-and-browser-performance-impact-of-target-blank-80e5e67db547](https://medium.com/@darrensimio/security-vulnerability-and-browser-performance-impact-of-target-blank-80e5e67db547))
6017
6135
6136
+
Singh, Animesh, Anupama Murthy, and Christian Kadner,
6137
+
[Integrate adversarial attacks in a model training pipeline](https://developer.ibm.com/patterns/integrate-adversarial-attacks-model-training-pipeline/),
The Fuzzing Project ([https://fuzzing-project.org/](https://fuzzing-project.org/))
6031
6159
6032
6160
The Linux Foundation, *Summary of GDPR Concepts For Free and Open Source Software Projects*, 2018 ([https://www.linuxfoundation.org/wp-content/uploads/2018/05/lf_gdpr_052418.pdf](https://www.linuxfoundation.org/wp-content/uploads/2018/05/lf_gdpr_052418.pdf))
0 commit comments