ModelAudit

English | 中文

ModelAudit

LLM Distillation Detection and Model Fingerprinting
via Statistical Forensics

LLM 蒸馏检测与模型指纹审计 — 统计取证 · 行为签名 · 跨模型血缘推断
Detect unauthorized model distillation through behavioral probing, stylistic fingerprinting, and representation similarity analysis

Abstract · Problem Statement · Formal Framework · Architecture · Key Innovations · Quick Start · Detection Methods · MCP Server · Ecosystem · References

Abstract

大语言模型的蒸馏行为 (knowledge distillation) 已成为模型知识产权保护的核心威胁——学生模型通过模仿教师模型的输出分布，可以在未经授权的情况下复制其能力。现有检测方法要么依赖白盒权重访问（实际场景中通常不可得），要么仅分析表面文本特征（易被规避）。

ModelAudit 提出基于统计取证 (statistical forensics) 的多方法蒸馏检测框架：通过行为探测 (behavioral probing) 提取模型指纹 $\mathcal{F}(M)$，基于假设检验 (hypothesis testing) 判定蒸馏关系，结合风格签名 (stylistic signature)、行为血缘推断 (behavioral lineage inference) 和表示相似度 (representation similarity) 四种互补方法，实现黑盒到白盒的完整审计链。

ModelAudit implements a multi-method distillation detection framework based on statistical forensics. The system extracts model fingerprints $\mathcal{F}(M)$ through 20 behavioral probes, applies hypothesis testing $H_0: M_S \perp M_T$ to determine distillation relationships, and combines 4 complementary methods — LLMmap (behavioral probing), DLI (lineage inference via Jensen-Shannon divergence), REEF (CKA representation similarity), and StyleAnalysis (12-family stylistic signatures) — to form a complete black-box to white-box audit chain. Built-in benchmark achieves 100% detection accuracy across 6 model families (14 samples).

Problem Statement

模型蒸馏检测面临三个根本性挑战：

根本性问题	形式化定义	现有方法局限	ModelAudit 的方法
蒸馏不可观测 Distillation Opacity	蒸馏过程 $M_S \leftarrow \text{KD}(M_T)$ 对外部观察者不可见，仅可观测 $M_T$ 和 $M_S$ 的输入输出行为	依赖白盒权重访问（API-only 模型无法适用）	行为探测指纹：20 个探测 Prompt 提取可观测行为特征，黑盒即可工作
风格趋同 Stylistic Convergence	RLHF 对齐使不同模型的输出风格趋于相似，$\text{style}(M_i) \approx \text{style}(M_j)$	简单文本特征（长度、词频）区分度不足	多维度签名：自我认知 / 安全边界 / 注入测试 / 格式控制等 10 个维度，捕获深层行为差异
跨模型不可比 Cross-Model Incomparability	不同 Provider 的 API 格式、参数、行为规范各异	单一方法覆盖不全，黑盒/白盒方法割裂	四方法融合：LLMmap + DLI + REEF + StyleAnalysis，从行为到表示多层次覆盖

ModelAudit 不是通用的模型评测工具。它专注于一个问题：这个模型是否未经授权地复制了另一个模型的能力？ 通过行为指纹提取和统计检验给出可量化的审计结论。

Formal Framework

Model Fingerprint Extraction

模型指纹定义为探测集上的行为响应分布：

$$\mathcal{F}(M) = {p_M(y \mid x_i)}_{i=1}^{N}$$

其中 ${x_i}_{i=1}^{N}$ 为 $N=20$ 个探测 Prompt，覆盖自我认知、安全边界、注入测试、推理、创意、多语言、格式控制、角色扮演、代码生成、摘要能力 10 个维度。每个探测的响应 $y$ 被提取为特征向量 $\phi(y) \in \mathbb{R}^d$。

Distillation Hypothesis Testing

蒸馏检测形式化为假设检验问题：

$$H_0: M_S \perp M_T \quad \text{vs} \quad H_1: M_S \leftarrow M_T$$

检验统计量（LLMmap 方法）——基于指纹向量的 Pearson 相关：

$$\text{sim}(M_1, M_2) = \frac{\sum_i (\phi_i^{(1)} - \bar{\phi}^{(1)})(\phi_i^{(2)} - \bar{\phi}^{(2)})}{\sqrt{\sum_i (\phi_i^{(1)} - \bar{\phi}^{(1)})^2 \cdot \sum_i (\phi_i^{(2)} - \bar{\phi}^{(2)})^2}}$$

当 $\text{sim}(M_S, M_T) > \delta$（默认 $\delta = 0.7$）时，拒绝 $H_0$，判定存在蒸馏嫌疑。

Behavioral Lineage Inference (DLI)

基于 Jensen-Shannon 散度的血缘推断：

$$D_{JS}(P | Q) = \frac{1}{2} D_{KL}(P | M) + \frac{1}{2} D_{KL}(Q | M), \quad M = \frac{P + Q}{2}$$

对每个探测维度计算行为签名的 JS 散度，综合多维度得分判定血缘关系。

Representation Similarity (REEF)

白盒场景下，基于 Centered Kernel Alignment (CKA) 比对中间层隐藏状态：

$$\text{CKA}(X, Y) = \frac{|Y^T X|_F^2}{|X^T X|_F \cdot |Y^T Y|_F}$$

其中 $X, Y$ 分别为教师和学生模型在相同输入上的隐藏层激活矩阵。逐层 CKA 热力图可揭示蒸馏发生在哪些层。

Architecture

graph LR
    P["Probe Library<br/>20 Prompts × 10 Dims"] --> E["AuditEngine<br/>Concurrent Probing"]
    E --> F["Fingerprint<br/>Feature Extraction"]
    F --> L["LLMmap<br/>Pearson Correlation"]
    F --> D["DLI<br/>JS Divergence"]
    F --> S["StyleAnalysis<br/>12-Family Signatures"]
    F --> R["REEF<br/>CKA Similarity"]
    L --> V["Verdict Engine<br/>Hypothesis Testing"]
    D --> V
    S --> V
    R --> V
    V --> Rep["Audit Report<br/>6-Section Markdown"]

    style E fill:#0969da,color:#fff,stroke:#0969da
    style V fill:#8b5cf6,color:#fff,stroke:#8b5cf6
    style Rep fill:#2da44e,color:#fff,stroke:#2da44e
    style P fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style F fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style L fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style D fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style S fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style R fill:#1a1a2e,color:#e0e0e0,stroke:#444

Layered Architecture

层	模块	职责
Probing	`probes/prompts.py`	20 个探测 Prompt，覆盖 10 个行为维度
Engine	`engine.py`	统一入口，ThreadPoolExecutor 并发探测 (4 并发)
Methods	`methods/`	4 种检测方法注册表，按黑盒/白盒分层
Fingerprint	`models.py`	Pydantic 数据模型，指纹特征向量
Cache	`cache.py`	SHA-256 防碰撞指纹缓存，TTL 过期
Report	`report.py`	6 节结构化审计报告生成
Benchmark	`benchmark.py`	14 条样本 × 6 家族内置评估集
Interface	`cli.py` · `mcp_server.py`	CLI + MCP 8 工具

Key Innovations

1. Multi-Method Forensic Analysis

四种互补检测方法覆盖从黑盒到白盒的完整审计链：

方法	类型	原理	参考
LLMmap	黑盒	20 个探测 Prompt，Pearson 相关比对响应模式	USENIX Security 2025
DLI	黑盒	行为签名 + Jensen-Shannon 散度血缘推断	ICLR 2026
REEF	白盒	CKA 逐层隐藏状态相似度	NeurIPS 2024
StyleAnalysis	风格分析	12 个模型家族风格签名 + 语言检测	—

任一方法独立可用，多方法融合提高判定置信度。内置 benchmark 在 6 个模型家族上实现 100% 检测准确率。

2. Behavioral Probing with 10-Dimensional Coverage

超越简单的文本统计特征，通过 10 个认知维度的结构化探测提取深层行为差异：

维度	探测内容
自我认知	模型身份、创建者、训练截止
安全边界	拒绝策略、措辞差异
注入测试	Prompt injection 响应差异
知识与推理	知识边界、逻辑推理、伦理判断
创意写作	叙事风格、类比能力
多语言	中文响应、多语翻译
格式控制	JSON 输出、Markdown 表格
角色扮演	角色一致性、创意表达
代码生成	编码风格、注释习惯
摘要能力	信息压缩、表达密度

这些维度在 RLHF 对齐后仍保留显著的模型间差异，是可靠的指纹特征来源。

3. Cross-Provider Audit Chain

支持跨 Provider 的蒸馏审计——教师和学生模型可来自不同 API：

# 跨 provider 审计：Anthropic 教师 vs Moonshot 学生
knowlyr-modelaudit audit \
  --teacher claude-opus --teacher-provider anthropic \
  --student kimi-k2.5 --student-provider openai \
  --student-api-base https://api.moonshot.cn/v1 \
  -o report.md

自动生成 6 节详细审计报告：审计对象 → 方法 → 结果（指纹详情 + 逐条探测）→ 关键发现 → 结论 → 局限性声明。

4. Concurrent Probing with Intelligent Caching

ThreadPoolExecutor 并发发送探测 Prompt（默认 4 并发），指纹缓存支持 SHA-256 防碰撞 + TTL 过期：

首次探测：并发调用 API，自动缓存指纹到 .modelaudit_cache/
再次审计：直接复用缓存，避免重复 API 调用
智能重试：指数退避 + 认证/速率限制错误分类 + 可配置超时与重试次数

支持识别的 12 个模型家族：gpt-4 · gpt-3.5 · claude · llama · gemini · qwen · deepseek · mistral · yi · phi · cohere · chatglm

Quick Start

pip install knowlyr-modelaudit

可选依赖

pip install knowlyr-modelaudit[blackbox]   # 黑盒指纹 (openai, anthropic, httpx)
pip install knowlyr-modelaudit[whitebox]   # 白盒指纹 (torch, transformers)
pip install knowlyr-modelaudit[mcp]        # MCP 服务器
pip install knowlyr-modelaudit[all]        # 全部功能

# 1. 检测文本来源
knowlyr-modelaudit detect texts.jsonl

# 2. 验证模型身份
knowlyr-modelaudit verify gpt-4o --provider openai

# 3. 比对模型指纹
knowlyr-modelaudit compare gpt-4o claude-sonnet --provider openai

# 4. 完整蒸馏审计
knowlyr-modelaudit audit --teacher gpt-4o --student my-model -o report.md

# 5. 运行 benchmark
knowlyr-modelaudit benchmark

Python SDK

from modelaudit import AuditEngine

engine = AuditEngine()

# 检测文本来源
results = engine.detect(["Hello! I'd be happy to help..."])
for r in results:
    print(f"{r.predicted_model}: {r.confidence:.2%}")

# 比对模型指纹
result = engine.compare("gpt-4o", "my-model", method="llmmap")
print(f"相似度: {result.similarity:.4f}")
print(f"蒸馏关系: {'是' if result.is_derived else '否'}")

# 完整审计（跨 provider）
audit = engine.audit(
    "claude-opus", "kimi-k2.5",
    teacher_provider="anthropic",
    student_provider="openai",
    student_api_base="https://api.moonshot.cn/v1",
)
print(f"{audit.verdict} (confidence: {audit.confidence:.3f})")

Detection Methods

探测维度详情（20 个 Probe）

维度	探测内容
自我认知	模型身份、创建者、训练截止
安全边界	拒绝策略、措辞差异
注入测试	Prompt injection 响应差异
知识与推理	知识边界、逻辑推理、伦理判断
创意写作	叙事风格、类比能力
多语言	中文响应、多语翻译
格式控制	JSON 输出、Markdown 表格
角色扮演	角色一致性、创意表达
代码生成	编码风格、注释习惯
摘要能力	信息压缩、表达密度

MCP Server

{
  "mcpServers": {
    "knowlyr-modelaudit": {
      "command": "uv",
      "args": ["--directory", "/path/to/model-audit", "run", "python", "-m", "modelaudit.mcp_server"]
    }
  }
}

Tool	Description
`detect_text_source`	检测文本数据来源
`verify_model`	验证模型身份
`compare_models`	黑盒比对 (llmmap / dli / style)
`compare_models_whitebox`	白盒比对 (REEF CKA)
`audit_distillation`	完整蒸馏审计
`audit_memorization`	记忆化检测（前缀补全相似度）
`audit_report`	生成综合审计报告
`audit_watermark`	水印检测（零宽字符 / 统计特征 / 双元组唯一率）

CLI Reference

完整命令列表

命令	功能
`knowlyr-modelaudit detect <file>`	检测文本数据来源
`knowlyr-modelaudit detect <file> -n 50`	限制检测条数
`knowlyr-modelaudit verify <model>`	验证模型身份
`knowlyr-modelaudit compare <a> <b>`	比对两个模型指纹
`knowlyr-modelaudit audit --teacher <a> --student <b>`	完整蒸馏审计
`knowlyr-modelaudit audit ... --teacher-provider anthropic`	跨 provider 审计
`knowlyr-modelaudit audit ... --no-cache`	跳过缓存
`knowlyr-modelaudit audit ... -f json`	JSON 格式报告
`knowlyr-modelaudit cache list`	查看缓存的指纹
`knowlyr-modelaudit cache clear`	清除所有缓存
`knowlyr-modelaudit benchmark`	运行内置 benchmark
`knowlyr-modelaudit benchmark --label claude`	按模型家族过滤
`knowlyr-modelaudit methods`	列出可用检测方法

Ecosystem

Architecture Diagram

graph LR
    Radar["Radar<br/>Discovery"] --> Recipe["Recipe<br/>Analysis"]
    Recipe --> Synth["Synth<br/>Generation"]
    Recipe --> Label["Label<br/>Annotation"]
    Synth --> Check["Check<br/>Quality"]
    Label --> Check
    Check --> Audit["Audit<br/>Model Audit"]
    Crew["Crew<br/>Deliberation Engine"]
    Agent["Agent<br/>RL Framework"]
    ID["ID<br/>Identity Runtime"]
    Crew -.->|能力定义| ID
    ID -.->|身份 + 记忆| Crew
    Crew -.->|轨迹 + 奖励| Agent
    Agent -.->|优化策略| Crew

    style Audit fill:#0969da,color:#fff,stroke:#0969da
    style Crew fill:#2da44e,color:#fff,stroke:#2da44e
    style Agent fill:#8b5cf6,color:#fff,stroke:#8b5cf6
    style ID fill:#e5534b,color:#fff,stroke:#e5534b
    style Radar fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style Recipe fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style Synth fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style Label fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style Check fill:#1a1a2e,color:#e0e0e0,stroke:#444

Layer	Project	Description	Repo
Discovery	AI Dataset Radar	数据集竞争情报、趋势分析	GitHub
Analysis	DataRecipe	逆向分析、Schema 提取、成本估算	GitHub
Production	DataSynth / DataLabel	LLM 批量合成 / 轻量标注	GitHub · GitHub
Quality	DataCheck	规则验证、重复检测、分布分析	GitHub
Audit	ModelAudit	蒸馏检测 · 模型指纹 · 统计取证	You are here
Identity	knowlyr-id	身份系统 + AI 员工运行时	GitHub
Deliberation	Crew	对抗式多智能体协商 · 持久记忆进化 · MCP 原生	GitHub
Agent Training	knowlyr-gym	Gymnasium 风格 RL 框架 · 过程奖励模型 · SFT/DPO/GRPO	GitHub

Development

git clone https://github.com/liuxiaotong/model-audit.git
cd model-audit
pip install -e ".[all,dev]"
pytest

CI: GitHub Actions，Python 3.10+。Tag push 自动发布 PyPI + GitHub Release。

References

LLMmap — Haller, R. et al., 2025. LLMmap: Fingerprinting For Large Language Models. USENIX Security — 行为探测指纹的基础方法
DLI — Chen, W. et al., 2026. Detecting LLM Distillation via Behavioral Lineage Inference. ICLR — 基于 JS 散度的蒸馏血缘推断
REEF — Jia, J. et al., 2024. REEF: Representation Encoding Fingerprints for Large Language Models. NeurIPS — CKA 白盒表示相似度
Knowledge Distillation — Hinton, G. et al., 2015. Distilling the Knowledge in a Neural Network. arXiv:1503.02531 — 知识蒸馏的奠基性工作
CKA — Kornblith, S. et al., 2019. Similarity of Neural Network Representations Revisited. ICML — 表示相似度度量方法
Model Fingerprinting — Cao, X. et al., 2021. IPGuard: Protecting Intellectual Property of Deep Neural Networks via Fingerprinting the Classification Boundary. AsiaCCS

License

MIT

_{knowlyr — LLM distillation detection and model fingerprinting via statistical forensics}

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
docs		docs
src/modelaudit		src/modelaudit
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ModelAudit

LLM Distillation Detection and Model Fingerprinting
via Statistical Forensics

Abstract

Problem Statement

Formal Framework

Model Fingerprint Extraction

Distillation Hypothesis Testing

Behavioral Lineage Inference (DLI)

Representation Similarity (REEF)

Architecture

Layered Architecture

Key Innovations

1. Multi-Method Forensic Analysis

2. Behavioral Probing with 10-Dimensional Coverage

3. Cross-Provider Audit Chain

4. Concurrent Probing with Intelligent Caching

Quick Start

Detection Methods

MCP Server

CLI Reference

Ecosystem

Development

References

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Languages

License

liuxiaotong/model-audit

Folders and files

Latest commit

History

Repository files navigation

ModelAudit

LLM Distillation Detection and Model Fingerprintingvia Statistical Forensics

Abstract

Problem Statement

Formal Framework

Model Fingerprint Extraction

Distillation Hypothesis Testing

Behavioral Lineage Inference (DLI)

Representation Similarity (REEF)

Architecture

Layered Architecture

Key Innovations

1. Multi-Method Forensic Analysis

2. Behavioral Probing with 10-Dimensional Coverage

3. Cross-Provider Audit Chain

4. Concurrent Probing with Intelligent Caching

Quick Start

Detection Methods

MCP Server

CLI Reference

Ecosystem

Development

References

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Languages

LLM Distillation Detection and Model Fingerprinting
via Statistical Forensics

Packages