Skip to content

Commit 626770a

Browse files
authored
Merge pull request github#8004 from ahmed-farid-dev/ZipSlip
Add query to detect ZipSlip
2 parents 4cfe045 + 3d14c5f commit 626770a

File tree

12 files changed

+341
-0
lines changed

12 files changed

+341
-0
lines changed
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
<!DOCTYPE qhelp PUBLIC
2+
"-//Semmle//qhelp//EN"
3+
"qhelp.dtd">
4+
<qhelp>
5+
6+
<overview>
7+
<p>Extracting files from a malicious zip archive without validating that the destination file path
8+
is within the destination directory can cause files outside the destination directory to be
9+
overwritten, due to the possible presence of directory traversal elements (<code>..</code>) in
10+
archive paths.</p>
11+
12+
<p>Zip archives contain archive entries representing each file in the archive. These entries
13+
include a file path for the entry, but these file paths are not restricted and may contain
14+
unexpected special elements such as the directory traversal element (<code>..</code>). If these
15+
file paths are used to determine an output file to write the contents of the archive item to, then
16+
the file may be written to an unexpected location. This can result in sensitive information being
17+
revealed or deleted, or an attacker being able to influence behavior by modifying unexpected
18+
files.</p>
19+
20+
<p>For example, if a Zip archive contains a file entry <code>..\sneaky-file</code>, and the Zip archive
21+
is extracted to the directory <code>c:\output</code>, then naively combining the paths would result
22+
in an output file path of <code>c:\output\..\sneaky-file</code>, which would cause the file to be
23+
written to <code>c:\sneaky-file</code>.</p>
24+
25+
</overview>
26+
<recommendation>
27+
28+
<p>Ensure that output paths constructed from Zip archive entries are validated
29+
to prevent writing files to unexpected locations.</p>
30+
31+
<p>The recommended way of writing an output file from a Zip archive entry is to call <code>extract()</code> or <code>extractall()</code>.
32+
</p>
33+
34+
</recommendation>
35+
36+
<example>
37+
<p>
38+
In this example an archive is extracted without validating file paths.
39+
</p>
40+
41+
<sample src="zipslip_bad.py" />
42+
43+
<p>To fix this vulnerability, we need to call the function <code>extractall()</code>.
44+
</p>
45+
46+
<sample src="zipslip_good.py" />
47+
48+
</example>
49+
<references>
50+
<li>
51+
Snyk:
52+
<a href="https://snyk.io/research/zip-slip-vulnerability">Zip Slip Vulnerability</a>.
53+
</li>
54+
55+
</references>
56+
</qhelp>
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
/**
2+
* @name Arbitrary file write during archive extraction ("Zip Slip")
3+
* @description Extracting files from a malicious archive without validating that the
4+
* destination file path is within the destination directory can cause files outside
5+
* the destination directory to be overwritten.
6+
* @kind path-problem
7+
* @id py/zipslip
8+
* @problem.severity error
9+
* @security-severity 7.5
10+
* @precision high
11+
* @tags security
12+
* external/cwe/cwe-022
13+
*/
14+
15+
import python
16+
import experimental.semmle.python.security.ZipSlip
17+
import DataFlow::PathGraph
18+
19+
from ZipSlipConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
20+
where config.hasFlowPath(source, sink)
21+
select sink.getNode(), source, sink, "Extraction of zipfile from $@", source.getNode(),
22+
"a potentially untrusted source"
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
import zipfile
2+
import shutil
3+
4+
def unzip(filename):
5+
with tarfile.open(filename) as zipf:
6+
#BAD : This could write any file on the filesystem.
7+
for entry in zipf:
8+
shutil.copyfile(entry, "/tmp/unpack/")
9+
10+
def unzip4(filename):
11+
zf = zipfile.ZipFile(filename)
12+
filelist = zf.namelist()
13+
for x in filelist:
14+
with zf.open(x) as srcf:
15+
shutil.copyfileobj(srcf, dstfile)
16+
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
import zipfile
2+
3+
def unzip(filename, dir):
4+
zf = zipfile.ZipFile(filename)
5+
zf.extractall(dir)
6+
7+
8+
def unzip1(filename, dir):
9+
zf = zipfile.ZipFile(filename)
10+
zf.extract(dir)

python/ql/src/experimental/semmle/python/Concepts.qll

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,73 @@ private import semmle.python.dataflow.new.RemoteFlowSources
1414
private import semmle.python.dataflow.new.TaintTracking
1515
private import experimental.semmle.python.Frameworks
1616

17+
/** Provides classes for modeling copying file related APIs. */
18+
module CopyFile {
19+
/**
20+
* A data flow node for copying file.
21+
*
22+
* Extend this class to model new APIs. If you want to refine existing API models,
23+
* extend `CopyFile` instead.
24+
*/
25+
abstract class Range extends DataFlow::Node {
26+
/**
27+
* Gets the argument containing the path.
28+
*/
29+
abstract DataFlow::Node getAPathArgument();
30+
31+
/**
32+
* Gets fsrc argument.
33+
*/
34+
abstract DataFlow::Node getfsrcArgument();
35+
}
36+
}
37+
38+
/**
39+
* A data flow node for copying file.
40+
*
41+
* Extend this class to refine existing API models. If you want to model new APIs,
42+
* extend `CopyFile::Range` instead.
43+
*/
44+
class CopyFile extends DataFlow::Node {
45+
CopyFile::Range range;
46+
47+
CopyFile() { this = range }
48+
49+
DataFlow::Node getAPathArgument() { result = range.getAPathArgument() }
50+
51+
DataFlow::Node getfsrcArgument() { result = range.getfsrcArgument() }
52+
}
53+
54+
/** Provides classes for modeling log related APIs. */
55+
module LogOutput {
56+
/**
57+
* A data flow node for log output.
58+
*
59+
* Extend this class to model new APIs. If you want to refine existing API models,
60+
* extend `LogOutput` instead.
61+
*/
62+
abstract class Range extends DataFlow::Node {
63+
/**
64+
* Get the parameter value of the log output function.
65+
*/
66+
abstract DataFlow::Node getAnInput();
67+
}
68+
}
69+
70+
/**
71+
* A data flow node for log output.
72+
*
73+
* Extend this class to refine existing API models. If you want to model new APIs,
74+
* extend `LogOutput::Range` instead.
75+
*/
76+
class LogOutput extends DataFlow::Node {
77+
LogOutput::Range range;
78+
79+
LogOutput() { this = range }
80+
81+
DataFlow::Node getAnInput() { result = range.getAnInput() }
82+
}
83+
1784
/**
1885
* Since there is both XML module in normal and experimental Concepts,
1986
* we have to rename the experimental module as this.

python/ql/src/experimental/semmle/python/Frameworks.qll

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,3 +14,4 @@ private import experimental.semmle.python.libraries.PyJWT
1414
private import experimental.semmle.python.libraries.Python_JWT
1515
private import experimental.semmle.python.libraries.Authlib
1616
private import experimental.semmle.python.libraries.PythonJose
17+
private import experimental.semmle.python.frameworks.CopyFile
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
private import python
2+
private import experimental.semmle.python.Concepts
3+
private import semmle.python.dataflow.new.DataFlow
4+
private import semmle.python.ApiGraphs
5+
6+
private module CopyFileImpl {
7+
/**
8+
* The `shutil` module provides methods to copy or move files.
9+
* See:
10+
* - https://docs.python.org/3/library/shutil.html#shutil.copyfile
11+
* - https://docs.python.org/3/library/shutil.html#shutil.copy
12+
* - https://docs.python.org/3/library/shutil.html#shutil.copy2
13+
* - https://docs.python.org/3/library/shutil.html#shutil.copytree
14+
* - https://docs.python.org/3/library/shutil.html#shutil.move
15+
*/
16+
private class CopyFiles extends DataFlow::CallCfgNode, CopyFile::Range {
17+
CopyFiles() {
18+
this =
19+
API::moduleImport("shutil")
20+
.getMember(["copyfile", "copy", "copy2", "copytree", "move"])
21+
.getACall()
22+
}
23+
24+
override DataFlow::Node getAPathArgument() {
25+
result in [this.getArg(0), this.getArgByName("src")]
26+
}
27+
28+
override DataFlow::Node getfsrcArgument() { none() }
29+
}
30+
31+
// TODO: once we have flow summaries, model `shutil.copyfileobj` which copies the content between its' file-like arguments.
32+
// See https://docs.python.org/3/library/shutil.html#shutil.copyfileobj
33+
private class CopyFileobj extends DataFlow::CallCfgNode, CopyFile::Range {
34+
CopyFileobj() { this = API::moduleImport("shutil").getMember("copyfileobj").getACall() }
35+
36+
override DataFlow::Node getfsrcArgument() {
37+
result in [this.getArg(0), this.getArgByName("fsrc")]
38+
}
39+
40+
override DataFlow::Node getAPathArgument() { none() }
41+
}
42+
}
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
import python
2+
import experimental.semmle.python.Concepts
3+
import semmle.python.dataflow.new.DataFlow
4+
import semmle.python.ApiGraphs
5+
import semmle.python.dataflow.new.TaintTracking
6+
7+
class ZipSlipConfig extends TaintTracking::Configuration {
8+
ZipSlipConfig() { this = "ZipSlipConfig" }
9+
10+
override predicate isSource(DataFlow::Node source) {
11+
(
12+
source =
13+
API::moduleImport("zipfile").getMember("ZipFile").getReturn().getMember("open").getACall() or
14+
source =
15+
API::moduleImport("zipfile")
16+
.getMember("ZipFile")
17+
.getReturn()
18+
.getMember("namelist")
19+
.getACall() or
20+
source = API::moduleImport("tarfile").getMember("open").getACall() or
21+
source = API::moduleImport("tarfile").getMember("TarFile").getACall() or
22+
source = API::moduleImport("bz2").getMember("open").getACall() or
23+
source = API::moduleImport("bz2").getMember("BZ2File").getACall() or
24+
source = API::moduleImport("gzip").getMember("GzipFile").getACall() or
25+
source = API::moduleImport("gzip").getMember("open").getACall() or
26+
source = API::moduleImport("lzma").getMember("open").getACall() or
27+
source = API::moduleImport("lzma").getMember("LZMAFile").getACall()
28+
) and
29+
not source.getScope().getLocation().getFile().inStdlib()
30+
}
31+
32+
override predicate isSink(DataFlow::Node sink) {
33+
(
34+
sink = any(CopyFile copyfile).getAPathArgument() or
35+
sink = any(CopyFile copyfile).getfsrcArgument()
36+
) and
37+
not sink.getScope().getLocation().getFile().inStdlib()
38+
}
39+
}
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
edges
2+
| zipslip_bad.py:8:10:8:31 | ControlFlowNode for Attribute() | zipslip_bad.py:10:13:10:17 | SSA variable entry |
3+
| zipslip_bad.py:10:13:10:17 | SSA variable entry | zipslip_bad.py:11:25:11:29 | ControlFlowNode for entry |
4+
| zipslip_bad.py:14:10:14:28 | ControlFlowNode for Attribute() | zipslip_bad.py:16:13:16:17 | SSA variable entry |
5+
| zipslip_bad.py:16:13:16:17 | SSA variable entry | zipslip_bad.py:17:26:17:30 | ControlFlowNode for entry |
6+
| zipslip_bad.py:20:10:20:27 | ControlFlowNode for Attribute() | zipslip_bad.py:22:13:22:17 | SSA variable entry |
7+
| zipslip_bad.py:22:13:22:17 | SSA variable entry | zipslip_bad.py:23:29:23:33 | ControlFlowNode for entry |
8+
| zipslip_bad.py:27:10:27:22 | ControlFlowNode for Attribute() | zipslip_bad.py:29:13:29:13 | SSA variable x |
9+
| zipslip_bad.py:29:13:29:13 | SSA variable x | zipslip_bad.py:30:25:30:25 | ControlFlowNode for x |
10+
| zipslip_bad.py:34:16:34:28 | ControlFlowNode for Attribute() | zipslip_bad.py:35:9:35:9 | SSA variable x |
11+
| zipslip_bad.py:35:9:35:9 | SSA variable x | zipslip_bad.py:37:32:37:32 | ControlFlowNode for x |
12+
nodes
13+
| zipslip_bad.py:8:10:8:31 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
14+
| zipslip_bad.py:10:13:10:17 | SSA variable entry | semmle.label | SSA variable entry |
15+
| zipslip_bad.py:11:25:11:29 | ControlFlowNode for entry | semmle.label | ControlFlowNode for entry |
16+
| zipslip_bad.py:14:10:14:28 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
17+
| zipslip_bad.py:16:13:16:17 | SSA variable entry | semmle.label | SSA variable entry |
18+
| zipslip_bad.py:17:26:17:30 | ControlFlowNode for entry | semmle.label | ControlFlowNode for entry |
19+
| zipslip_bad.py:20:10:20:27 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
20+
| zipslip_bad.py:22:13:22:17 | SSA variable entry | semmle.label | SSA variable entry |
21+
| zipslip_bad.py:23:29:23:33 | ControlFlowNode for entry | semmle.label | ControlFlowNode for entry |
22+
| zipslip_bad.py:27:10:27:22 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
23+
| zipslip_bad.py:29:13:29:13 | SSA variable x | semmle.label | SSA variable x |
24+
| zipslip_bad.py:30:25:30:25 | ControlFlowNode for x | semmle.label | ControlFlowNode for x |
25+
| zipslip_bad.py:34:16:34:28 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
26+
| zipslip_bad.py:35:9:35:9 | SSA variable x | semmle.label | SSA variable x |
27+
| zipslip_bad.py:37:32:37:32 | ControlFlowNode for x | semmle.label | ControlFlowNode for x |
28+
subpaths
29+
#select
30+
| zipslip_bad.py:11:25:11:29 | ControlFlowNode for entry | zipslip_bad.py:8:10:8:31 | ControlFlowNode for Attribute() | zipslip_bad.py:11:25:11:29 | ControlFlowNode for entry | Extraction of zipfile from $@ | zipslip_bad.py:8:10:8:31 | ControlFlowNode for Attribute() | a potentially untrusted source |
31+
| zipslip_bad.py:17:26:17:30 | ControlFlowNode for entry | zipslip_bad.py:14:10:14:28 | ControlFlowNode for Attribute() | zipslip_bad.py:17:26:17:30 | ControlFlowNode for entry | Extraction of zipfile from $@ | zipslip_bad.py:14:10:14:28 | ControlFlowNode for Attribute() | a potentially untrusted source |
32+
| zipslip_bad.py:23:29:23:33 | ControlFlowNode for entry | zipslip_bad.py:20:10:20:27 | ControlFlowNode for Attribute() | zipslip_bad.py:23:29:23:33 | ControlFlowNode for entry | Extraction of zipfile from $@ | zipslip_bad.py:20:10:20:27 | ControlFlowNode for Attribute() | a potentially untrusted source |
33+
| zipslip_bad.py:30:25:30:25 | ControlFlowNode for x | zipslip_bad.py:27:10:27:22 | ControlFlowNode for Attribute() | zipslip_bad.py:30:25:30:25 | ControlFlowNode for x | Extraction of zipfile from $@ | zipslip_bad.py:27:10:27:22 | ControlFlowNode for Attribute() | a potentially untrusted source |
34+
| zipslip_bad.py:37:32:37:32 | ControlFlowNode for x | zipslip_bad.py:34:16:34:28 | ControlFlowNode for Attribute() | zipslip_bad.py:37:32:37:32 | ControlFlowNode for x | Extraction of zipfile from $@ | zipslip_bad.py:34:16:34:28 | ControlFlowNode for Attribute() | a potentially untrusted source |
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
experimental/Security/CWE-022/ZipSlip.ql

0 commit comments

Comments
 (0)