Skip to content

Commit 6ff0054

Browse files
authored
Merge pull request github#13431 from am0o0/amammad-ruby-YAMLunsafeLoad
Ruby: add seperate additional steps between `YAML.parse*` methods and `to_ruby`
2 parents f92c106 + 8212f5d commit 6ff0054

14 files changed

+590
-89
lines changed
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
<!DOCTYPE qhelp PUBLIC "-//Semmle//qhelp//EN" "qhelp.dtd">
2+
<qhelp>
3+
4+
<overview>
5+
<p>
6+
Deserializing untrusted data using any method that allows the construction of
7+
arbitrary objects is easily exploitable and, in many cases, allows an attacker
8+
to execute arbitrary code.
9+
</p>
10+
</overview>
11+
12+
<recommendation>
13+
14+
<p>
15+
If deserializing an untrusted YAML document using the <code>psych</code> gem,
16+
prefer the <code>safe_load</code> and <code>safe_load_file</code> methods over
17+
<code>load</code> and <code>load_file</code>, as the former will safely
18+
handle untrusted data. Avoid passing untrusted data to the <code>load_stream</code>
19+
method. In <code>psych</code> version 4.0.0 and above, the <code>load</code> method can
20+
safely be used.
21+
</p>
22+
23+
</recommendation>
24+
25+
<example>
26+
<p>
27+
The following example calls the <code>Marshal.load</code>,
28+
<code>JSON.load</code>, <code>YAML.load</code>, and <code>Oj.load</code> methods
29+
on data from an HTTP request. Since these methods are capable of deserializing
30+
to arbitrary objects, this is inherently unsafe.
31+
</p>
32+
<sample src="examples/UnsafeDeserializationBad.rb"/>
33+
34+
<p>
35+
Using <code>JSON.parse</code> and <code>YAML.safe_load</code> instead, as in the
36+
following example, removes the vulnerability. Similarly, calling
37+
<code>Oj.load</code> with any mode other than <code>:object</code> is safe, as
38+
is calling <code>Oj.safe_load</code>. Note that there is no safe way to deserialize
39+
untrusted data using <code>Marshal</code>.
40+
</p>
41+
<sample src="examples/UnsafeDeserializationGood.rb"/>
42+
</example>
43+
44+
<references>
45+
46+
<li>
47+
OWASP vulnerability description:
48+
<a href="https://www.owasp.org/index.php/Deserialization_of_untrusted_data">deserialization of untrusted data</a>.
49+
</li>
50+
<li>
51+
Ruby documentation: <a href="https://docs.ruby-lang.org/en/3.0.0/doc/security_rdoc.html">guidance on deserializing objects safely</a>.
52+
</li>
53+
<li>
54+
Ruby documentation: <a href="https://ruby-doc.org/stdlib-3.0.2/libdoc/yaml/rdoc/YAML.html#module-YAML-label-Security">security guidance on the YAML library</a>.
55+
</li>
56+
<li>
57+
You can read that how unsafe yaml load methods can lead to code executions:
58+
<a href="https://devcraft.io/2021/01/07/universal-deserialisation-gadget-for-ruby-2-x-3-x.html">Universal Deserialisation Gadget for Ruby 2.x-3.x </a>.
59+
</li>
60+
</references>
61+
62+
</qhelp>
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
/**
2+
* @name Deserialization of user-controlled yaml data
3+
* @description Deserializing user-controlled yaml data may allow attackers to
4+
* execute arbitrary code.
5+
* @kind path-problem
6+
* @problem.severity warning
7+
* @security-severity 9.8
8+
* @precision high
9+
* @id rb/unsafe-unsafeyamldeserialization
10+
* @tags security
11+
* external/cwe/cwe-502
12+
*/
13+
14+
import ruby
15+
import codeql.ruby.security.UnsafeDeserializationQuery
16+
import UnsafeCodeConstructionFlow::PathGraph
17+
18+
from UnsafeCodeConstructionFlow::PathNode source, UnsafeCodeConstructionFlow::PathNode sink
19+
where UnsafeCodeConstructionFlow::flowPath(source, sink)
20+
select sink.getNode(), source, sink, "Unsafe deserialization depends on a $@.", source.getNode(),
21+
source.getNode().(UnsafeDeserialization::Source).describe()
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
/**
2+
* Provides a taint-tracking configuration for reasoning about unsafe deserialization.
3+
*
4+
* Note, for performance reasons: only import this file if
5+
* `UnsafeYamlDeserializationFlow` is needed, otherwise
6+
* `UnsafeYamlDeserializationCustomizations` should be imported instead.
7+
*/
8+
9+
private import codeql.ruby.AST
10+
private import codeql.ruby.DataFlow
11+
private import codeql.ruby.TaintTracking
12+
private import codeql.ruby.ApiGraphs
13+
import UnsafeYamlDeserializationCustomizations::UnsafeYamlDeserialization
14+
import Yaml
15+
16+
private module UnsafeYamlDeserializationConfig implements DataFlow::StateConfigSig {
17+
class FlowState = FlowState::State;
18+
19+
predicate isSource(DataFlow::Node source, FlowState state) {
20+
source instanceof Source and
21+
(state instanceof FlowState::Parse or state instanceof FlowState::Load)
22+
}
23+
24+
predicate isSink(DataFlow::Node sink, FlowState state) {
25+
sink instanceof Sink and
26+
(state instanceof FlowState::Parse or state instanceof FlowState::Load)
27+
}
28+
29+
predicate isBarrier(DataFlow::Node node) { node instanceof Sanitizer }
30+
31+
/**
32+
* Holds if taint with state `stateFrom` can flow from `pred` to `succ` with state `stateTo`.
33+
*
34+
* This is a taint step related to the result of `YAML.parse` calls, or similar.
35+
* In the following example, this step will propagate taint from
36+
* `source` to `sink`:
37+
* this contains two separate steps:
38+
* ```rb
39+
* x = source
40+
* sink = YAML.parse(x)
41+
* ```
42+
* By second step
43+
* source is a Successor of `YAML.parse(x)`
44+
* which ends with `to_ruby` or an Element of `to_ruby`
45+
* ```ruby
46+
* sink source.to_ruby # Unsafe call
47+
* ```
48+
*/
49+
predicate isAdditionalFlowStep(
50+
DataFlow::Node pred, FlowState stateFrom, DataFlow::Node succ, FlowState stateTo
51+
) {
52+
(
53+
exists(API::Node parseSuccessors, API::Node parseMethod |
54+
parseMethod = yamlLibrary().getMethod(["parse", "parse_stream", "parse_file"]) and
55+
parseSuccessors = yamlParseNode(parseMethod)
56+
|
57+
succ = parseSuccessors.getMethod("to_ruby").getReturn().asSource() and
58+
pred = parseMethod.getArgument(0).asSink()
59+
)
60+
or
61+
exists(API::Node parseMethod |
62+
parseMethod = yamlLibrary().getMethod(["parse", "parse_stream", "parse_file"])
63+
|
64+
succ = parseMethod.getReturn().asSource() and
65+
pred = parseMethod.getArgument(0).asSink()
66+
)
67+
) and
68+
stateFrom instanceof FlowState::Parse and
69+
stateTo instanceof FlowState::Parse
70+
}
71+
}
72+
73+
predicate isAdditionalFlowStepTest(DataFlow::Node pred, DataFlow::Node succ) {
74+
exists(API::Node parseMethod |
75+
parseMethod = yamlLibrary().getMethod(["parse", "parse_stream", "parse_file"])
76+
|
77+
succ = parseMethod.getReturn().asSource() and
78+
pred = parseMethod.getArgument(0).asSink()
79+
)
80+
}
81+
82+
/**
83+
* Taint-tracking for reasoning about unsafe deserialization.
84+
*/
85+
module UnsafeCodeConstructionFlow = TaintTracking::GlobalWithState<UnsafeYamlDeserializationConfig>;
Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
/**
2+
* Provides default sources, sinks and sanitizers for reasoning about unsafe
3+
* deserialization, as well as extension points for adding your own.
4+
*/
5+
6+
private import codeql.ruby.AST
7+
private import codeql.ruby.ApiGraphs
8+
private import codeql.ruby.CFG
9+
private import codeql.ruby.DataFlow
10+
private import codeql.ruby.dataflow.RemoteFlowSources
11+
private import codeql.ruby.frameworks.ActiveJob
12+
private import codeql.ruby.frameworks.core.Module
13+
private import codeql.ruby.frameworks.core.Kernel
14+
private import Yaml
15+
16+
module UnsafeYamlDeserialization {
17+
/** Flow states used to distinguish whether we are using a yaml parse node or a yaml load node. */
18+
module FlowState {
19+
private newtype TState =
20+
TParse() or
21+
TLoad()
22+
23+
/** A flow state used to distinguish whether we have a middle node that use `YAML.load*` or `YAML.parse*` */
24+
class State extends TState {
25+
/**
26+
* Gets a string representation of this state.
27+
*/
28+
string toString() { result = this.getStringRepresentation() }
29+
30+
/**
31+
* Gets a canonical string representation of this state.
32+
*/
33+
string getStringRepresentation() {
34+
this = TParse() and result = "parse"
35+
or
36+
this = TLoad() and result = "load"
37+
}
38+
}
39+
40+
/**
41+
* A flow state used for `YAML.parse*` methods.
42+
*/
43+
class Parse extends State, TParse { }
44+
45+
/**
46+
* A flow state used for `YAML.load*` methods.
47+
*/
48+
class Load extends State, TLoad { }
49+
}
50+
51+
/**
52+
* A data flow source for unsafe deserialization vulnerabilities.
53+
*/
54+
abstract class Source extends DataFlow::Node {
55+
/** Gets a string that describes the source. */
56+
string describe() { result = "user-provided value" }
57+
}
58+
59+
/**
60+
* A data flow sink for unsafe deserialization vulnerabilities.
61+
*/
62+
abstract class Sink extends DataFlow::Node { }
63+
64+
/**
65+
* A sanitizer for unsafe deserialization vulnerabilities.
66+
*/
67+
abstract class Sanitizer extends DataFlow::Node { }
68+
69+
/** A source of remote user input, considered as a flow source for unsafe deserialization. */
70+
class RemoteFlowSourceAsSource extends Source instanceof RemoteFlowSource { }
71+
72+
/** A read of data from `STDIN`/`ARGV`, considered as a flow source for unsafe deserialization. */
73+
class StdInSource extends UnsafeYamlDeserialization::Source {
74+
boolean stdin;
75+
76+
StdInSource() {
77+
this = API::getTopLevelMember(["STDIN", "ARGF"]).getAMethodCall(["gets", "read"]) and
78+
stdin = true
79+
or
80+
// > $stdin == STDIN
81+
// => true
82+
// but $stdin is special in that it is a global variable and not a constant. `API::getTopLevelMember` only gets constants.
83+
exists(DataFlow::Node dollarStdin |
84+
dollarStdin.asExpr().getExpr().(GlobalVariableReadAccess).getVariable().getName() = "$stdin" and
85+
this = dollarStdin.getALocalSource().getAMethodCall(["gets", "read"])
86+
) and
87+
stdin = true
88+
or
89+
// ARGV.
90+
this.asExpr().getExpr().(GlobalVariableReadAccess).getVariable().getName() = "ARGV" and
91+
stdin = false
92+
or
93+
this.(Kernel::KernelMethodCall).getMethodName() = ["gets", "readline", "readlines"] and
94+
stdin = true
95+
}
96+
97+
override string describe() {
98+
if stdin = true then result = "value from stdin" else result = "value from ARGV"
99+
}
100+
}
101+
102+
/**
103+
* An argument in a call to `YAML.unsafe_*` and `YAML.load_stream` , considered a sink
104+
* for unsafe deserialization. The `YAML` module is an alias of `Psych` in
105+
* recent versions of Ruby.
106+
*/
107+
class YamlLoadArgument extends Sink {
108+
YamlLoadArgument() {
109+
// Note: this is safe in psych/yaml >= 4.0.0.
110+
this = yamlLibrary().getAMethodCall("load").getArgument(0)
111+
or
112+
this =
113+
yamlLibrary()
114+
.getAMethodCall(["unsafe_load_file", "unsafe_load", "load_stream"])
115+
.getArgument(0)
116+
or
117+
this = yamlLibrary().getAMethodCall(["unsafe_load", "load_stream"]).getKeywordArgument("yaml")
118+
or
119+
this = yamlLibrary().getAMethodCall("unsafe_load_file").getKeywordArgument("filename")
120+
}
121+
}
122+
123+
/**
124+
* An argument in a call to `YAML.parse*`, considered a sink for unsafe deserialization
125+
* if there is a call to `to_ruby` on the returned value of any Successor.
126+
*/
127+
class YamlParseArgument extends Sink {
128+
YamlParseArgument() {
129+
this =
130+
yamlParseNode(yamlLibrary().getMethod(["parse", "parse_stream", "parse_file"]))
131+
.getMethod(["to_ruby", "transform"])
132+
.getReturn()
133+
.asSource()
134+
}
135+
}
136+
}
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
/**
2+
* Provides modeling for the `YAML` and `Psych` libraries.
3+
*/
4+
5+
private import codeql.ruby.dataflow.FlowSteps
6+
private import codeql.ruby.DataFlow
7+
private import codeql.ruby.ApiGraphs
8+
9+
/**
10+
* Gets A Node ends with YAML parse, parse_stream, parse_file methods
11+
*/
12+
API::Node yamlParseNode(API::Node yamlParseInstance) {
13+
result = yamlParseInstance
14+
or
15+
result = yamlParseNode(yamlParseInstance).getReturn()
16+
or
17+
result = yamlParseNode(yamlParseInstance).getBlock()
18+
or
19+
result = yamlParseNode(yamlParseInstance).getAnElement()
20+
or
21+
result = yamlParseNode(yamlParseInstance).getParameter(_)
22+
or
23+
result = yamlParseNode(yamlParseInstance).getMethod(_)
24+
or
25+
result = yamlParseNode(yamlParseInstance).getMember(_)
26+
or
27+
result = yamlParseNode(yamlParseInstance).getArgument(_)
28+
}
29+
30+
/**
31+
* Gets A YAML module instance
32+
*/
33+
API::Node yamlLibrary() { result = API::getTopLevelMember(["YAML", "Psych"]) }
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
require 'yaml'
2+
3+
class UserController < ActionController::Base
4+
def yaml_example
5+
object = YAML.unsafe_load params[:yaml]
6+
object = YAML.load_stream params[:yaml]
7+
parsed_yaml = Psych.parse_stream(params[:yaml])
8+
9+
# to_ruby is unsafe
10+
parsed_yaml.children.each do |child|
11+
object = child.to_ruby
12+
end
13+
object = Psych.parse(params[:yaml]).to_ruby
14+
# ...
15+
end
16+
end
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
require 'yaml'
2+
3+
class UserController < ActionController::Base
4+
def safe_yaml_example
5+
object = YAML.load params[:yaml]
6+
object = Psych.load_file params[:yaml]
7+
object = YAML.safe_load params[:yaml]
8+
# ...
9+
end
10+
end

0 commit comments

Comments
 (0)