Skip to content

Commit c1e2e63

Browse files
authored
feat: 添加敏感信息过滤功能 (#16)
* feat: 添加敏感信息过滤功能 - 新增 sanitizer 模块实现敏感信息过滤机制 - 添加配置选项支持自定义过滤规则 - 在生成 commit 和分支名前自动过滤敏感信息 - 添加 CLI 参数 --no-sanitize 临时禁用过滤功能 - 新增文档 sanitizer.md 说明使用方法 Signed-off-by: jinlong <jinlong@tencent.com> * chore: 升级版本号至0.3.0 更新Cargo.toml、Cargo.lock和README文档中的版本号,从0.2.3升级到0.3.0。 Signed-off-by: jinlong <jinlong@tencent.com> --------- Signed-off-by: jinlong <jinlong@tencent.com>
1 parent f0a9332 commit c1e2e63

File tree

10 files changed

+323
-14
lines changed

10 files changed

+323
-14
lines changed

Cargo.lock

Lines changed: 2 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "fastcommit"
3-
version = "0.2.3"
3+
version = "0.3.0"
44
description = "AI-based command line tool to quickly generate standardized commit messages."
55
edition = "2021"
66
authors = ["longjin <fslongjin@vip.qq.com>"]
@@ -17,6 +17,7 @@ env_logger = "0.11.6"
1717
lazy_static = "1.5.0"
1818
log = "0.4.26"
1919
openai_api_rust = { git = "https://github.com/fslongjin/openai-api", rev = "e2a3f6f" }
20+
regex = "1.11.0"
2021
reqwest = { version = "0.12.9", features = ["json"] }
2122
serde = { version = "1.0.218", features = ["derive"] }
2223
serde_json = "1.0.134"

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ You can install `fastcommit` using the following method:
1010

1111
```bash
1212
# Install using cargo
13-
cargo install --git https://github.com/fslongjin/fastcommit --tag v0.2.3
13+
cargo install --git https://github.com/fslongjin/fastcommit --tag v0.3.0
1414
```
1515

1616
## Usage

README_CN.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
```bash
1010
# 使用 cargo 安装
11-
cargo install --git https://github.com/fslongjin/fastcommit --tag v0.2.3
11+
cargo install --git https://github.com/fslongjin/fastcommit --tag v0.3.0
1212
```
1313

1414
## 使用

docs/sanitizer.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Sanitizer Configuration Guide
2+
3+
To prevent leaking sensitive information when sending diffs/user descriptions to model providers, fastcommit includes a built-in secret sanitization mechanism. This mechanism replaces matched sensitive content with placeholders before generating commit messages or branch names, for example:
4+
5+
```
6+
AKIAIOSFODNN7EXAMPLE -> [REDACTED:AWS_ACCESS_KEY_ID#1]
7+
-----BEGIN PRIVATE KEY----- ... -> [REDACTED:PRIVATE_KEY_BLOCK#2]
8+
Bearer abcdef123456 .... -> [REDACTED:BEARER_TOKEN#3]
9+
```
10+
11+
## 1. Basic Toggle
12+
13+
Configuration file: `~/.fastcommit/config.toml`
14+
15+
Field:
16+
```
17+
sanitize_secrets = true
18+
```
19+
Set to `false` to completely disable sanitization.
20+
21+
## 2. Built-in Matching Rules
22+
Current built-in rules (name -> regex description):
23+
24+
| Name | Description |
25+
|------|-------------|
26+
| PRIVATE_KEY_BLOCK | Matches private key blocks from `-----BEGIN ... PRIVATE KEY-----` to `-----END ... PRIVATE KEY-----` |
27+
| GITHUB_TOKEN | Matches tokens with prefixes like `ghp_` / `ghs_` / `gho_` / `ghr_` / `ghu_` + 36 alphanumeric characters |
28+
| AWS_ACCESS_KEY_ID | Starts with `AKIA` + 16 uppercase alphanumeric characters |
29+
| JWT | Typical 3-segment Base64URL JWT structure |
30+
| BEARER_TOKEN | Bearer token headers (`Bearer xxx`) |
31+
| GENERIC_API_KEY | Common field names: `api_key` / `apikey` / `apiKey` / `secret` / `token` / `authorization` followed by separator and value |
32+
33+
Matched content will be replaced with `[REDACTED:<name>#sequence_number]`.
34+
35+
## 3. Custom Rules
36+
You can add custom rules in the configuration file to capture team-specific sensitive string formats.
37+
38+
Example:
39+
```
40+
[[custom_sanitize_patterns]]
41+
name = "INTERNAL_URL"
42+
regex = "https://internal\\.corp\\.example\\.com/[A-Za-z0-9/_-]+"
43+
44+
[[custom_sanitize_patterns]]
45+
name = "UUID_TOKEN"
46+
regex = "[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}"
47+
```
48+
49+
Notes:
50+
- `name`: Identifier in the placeholder; recommended to use uppercase underscore style.
51+
- `regex`: Rust regex (ECMAScript-like, but without backtracking support); please escape backslashes appropriately.
52+
- All custom rules are executed after built-in rules.
53+
- If a regex is invalid, it will be skipped and a warning will be output in the logs.
54+
55+
## 4. Viewing Sanitization Statistics
56+
The current version outputs the following when running with `RUST_LOG=debug`:
57+
```
58+
Sanitized N potential secrets from diff/prompt
59+
```
60+
In the future, `--show-redactions` can be added to display more detailed tables (planned feature).
61+
62+
## 5. Performance and Notes
63+
- There may be minor performance overhead for very large diffs (multiple find-replace passes). If performance is sensitive, reduce the number of custom rules.
64+
- Custom regex should not be overly broad, otherwise it may falsely match normal code context, affecting model understanding.
65+
- The model cannot see the original replaced content. If context hints are needed, design semantically expressive tags with `name`, for example: `DB_PASSWORD`/`INTERNAL_ENDPOINT`.
66+
67+
## 6. Common Custom Pattern Examples
68+
```
69+
[[custom_sanitize_patterns]]
70+
name = "SLACK_WEBHOOK"
71+
regex = "https://hooks\\.slack\\.com/services/[A-Za-z0-9/_-]+"
72+
73+
[[custom_sanitize_patterns]]
74+
name = "DISCORD_WEBHOOK"
75+
regex = "https://discord(?:app)?\\.com/api/webhooks/[0-9]+/[A-Za-z0-9_-]+"
76+
77+
[[custom_sanitize_patterns]]
78+
name = "GCP_SERVICE_ACCOUNT"
79+
regex = "[0-9]{12}-compute@developer\\.gserviceaccount\\.com"
80+
81+
[[custom_sanitize_patterns]]
82+
name = "STRIPE_KEY"
83+
regex = "sk_(live|test)_[A-Za-z0-9]{10,}"
84+
```
85+
86+
## 7. Complete Example Configuration Snippet
87+
```
88+
sanitize_secrets = true
89+
90+
[[custom_sanitize_patterns]]
91+
name = "INTERNAL_URL"
92+
regex = "https://internal\\.corp\\.example\\.com/[A-Za-z0-9/_-]+"
93+
94+
[[custom_sanitize_patterns]]
95+
name = "STRIPE_KEY"
96+
regex = "sk_(live|test)_[A-Za-z0-9]{10,}"
97+
```
98+
99+
## 8. Future Plans
100+
- Report mode: Output table statistics of match categories and counts
101+
- Allow listing redacted placeholder hints at the end of commit messages (configurable)
102+
103+
For adding new default built-in rules or improvements, welcome to submit Issues / PRs.

src/cli.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,4 +53,10 @@ pub struct Args {
5353
help = "Generate commit message (use with -b to output both)"
5454
)]
5555
pub generate_message: bool,
56+
57+
#[clap(
58+
long = "no-sanitize",
59+
help = "Temporarily disable sensitive info sanitizer for this run"
60+
)]
61+
pub no_sanitize: bool,
5662
}

src/config.rs

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,18 @@ use std::{fmt::Display, fs};
33

44
use crate::constants::{DEFAULT_MAX_TOKENS, DEFAULT_OPENAI_API_BASE, DEFAULT_OPENAI_MODEL};
55

6+
fn default_true() -> bool {
7+
true
8+
}
9+
10+
#[derive(Debug, Serialize, Deserialize, Clone)]
11+
pub struct CustomSanitizePattern {
12+
/// A short name/identifier for the pattern. e.g. "INTERNAL_URL"
13+
pub name: String,
14+
/// The regex pattern string. It should be a valid Rust regex.
15+
pub regex: String,
16+
}
17+
618
#[derive(Debug, Serialize, Deserialize)]
719
pub struct Config {
820
api_base: Option<String>,
@@ -16,6 +28,12 @@ pub struct Config {
1628
pub verbosity: Verbosity,
1729
/// Prefix for generated branch names (e.g. username in monorepo)
1830
pub branch_prefix: Option<String>,
31+
/// Enable sanitizing sensitive information (API keys, tokens, secrets) before sending diff to AI provider.
32+
#[serde(default = "default_true")]
33+
pub sanitize_secrets: bool,
34+
/// User defined extra regex patterns for sanitizer.
35+
#[serde(default)]
36+
pub custom_sanitize_patterns: Vec<CustomSanitizePattern>,
1937
}
2038

2139
impl Config {
@@ -104,6 +122,8 @@ impl Default for Config {
104122
language: CommitLanguage::default(),
105123
verbosity: Verbosity::default(),
106124
branch_prefix: None,
125+
sanitize_secrets: true,
126+
custom_sanitize_patterns: Vec::new(),
107127
}
108128
}
109129
}

src/generate.rs

Lines changed: 26 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,21 +7,31 @@ use crate::config::{self, Config};
77

88
use crate::constants::BRANCH_NAME_PROMPT;
99
use crate::constants::{DEFAULT_MAX_TOKENS, DEFAULT_OPENAI_MODEL, DEFAULT_PROMPT_TEMPLATE};
10+
use crate::sanitizer::sanitize_with_config;
1011
use crate::template_engine::{render_template, TemplateContext};
1112

1213
async fn generate_commit_message(
1314
diff: &str,
1415
config: &config::Config,
1516
user_description: Option<&str>,
1617
) -> anyhow::Result<String> {
17-
let auth = Auth::new(config.api_key.as_str());
18+
// sanitize diff & user description first
19+
let (sanitized_diff, sanitized_user_desc_opt, redactions) =
20+
sanitize_with_config(diff, user_description, config);
21+
if !redactions.is_empty() {
22+
log::debug!(
23+
"Sanitized {} potential secrets from diff/prompt",
24+
redactions.len()
25+
);
26+
}
1827

28+
let auth = Auth::new(config.api_key.as_str());
1929
let openai = OpenAI::new(auth, &config.api_base());
2030

21-
// Add "commit message: " prefix to user description if provided
22-
let prefixed_user_description = user_description.map(|desc| {
31+
// Add "commit message: " prefix to user description if provided (after sanitization)
32+
let prefixed_user_description = sanitized_user_desc_opt.map(|desc| {
2333
if desc.trim().is_empty() {
24-
desc.to_string()
34+
desc
2535
} else {
2636
format!("commit message: {}", desc)
2737
}
@@ -31,7 +41,7 @@ async fn generate_commit_message(
3141
config.conventional,
3242
config.language,
3343
config.verbosity,
34-
diff,
44+
&sanitized_diff,
3545
prefixed_user_description.as_deref(),
3646
);
3747

@@ -72,7 +82,6 @@ async fn generate_commit_message(
7282
.as_ref()
7383
.ok_or(anyhow::anyhow!("No message in response"))?
7484
.content;
75-
// Extract content between <aicommit> tags
7685
let commit_message = extract_aicommit_message(msg)?;
7786
Ok(commit_message)
7887
}
@@ -164,11 +173,19 @@ async fn generate_branch_name_with_ai(
164173
prefix: Option<&str>,
165174
config: &Config,
166175
) -> anyhow::Result<String> {
167-
let auth = Auth::new(config.api_key.as_str());
176+
// sanitize diff only (branch name uses only diff)
177+
let (sanitized_diff, _, redactions) = sanitize_with_config(diff, None, config);
178+
if !redactions.is_empty() {
179+
log::debug!(
180+
"Sanitized {} potential secrets from diff before branch generation",
181+
redactions.len()
182+
);
183+
}
168184

185+
let auth = Auth::new(config.api_key.as_str());
169186
let openai = OpenAI::new(auth, &config.api_base());
170187

171-
let prompt = BRANCH_NAME_PROMPT.replace("{{diff}}", diff);
188+
let prompt = BRANCH_NAME_PROMPT.replace("{{diff}}", &sanitized_diff);
172189
let messages = vec![
173190
Message {
174191
role: Role::System,
@@ -191,7 +208,7 @@ async fn generate_branch_name_with_ai(
191208
top_p: None,
192209
n: None,
193210
stream: Some(false),
194-
stop: None, // 移除 stop words 以避免思考过程中的干扰
211+
stop: None,
195212
max_tokens: Some(DEFAULT_MAX_TOKENS as i32),
196213
presence_penalty: None,
197214
frequency_penalty: None,
@@ -215,7 +232,6 @@ async fn generate_branch_name_with_ai(
215232

216233
let branch_name = extract_aicommit_message(&msg)?;
217234

218-
// Clean up the branch name
219235
let branch_name = if let Some(prefix) = prefix {
220236
format!("{}{}", prefix.trim(), branch_name.trim())
221237
} else {

src/main.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ mod cli;
55
mod config;
66
mod constants;
77
mod generate;
8+
mod sanitizer;
89
mod template_engine;
910
mod update_checker;
1011

@@ -24,6 +25,10 @@ async fn main() -> anyhow::Result<()> {
2425
if let Some(v) = args.verbosity {
2526
config.verbosity = v;
2627
}
28+
if args.no_sanitize {
29+
// CLI override to disable sanitizer
30+
config.sanitize_secrets = false;
31+
}
2732

2833
run_update_checker().await;
2934

0 commit comments

Comments
 (0)