Skip to content

Commit 12ac553

Browse files
authored
Merge pull request #32 from IBM/add-precommit-llm-cleanup
Add pre-commit hooks for smartquote and llm cleanup
2 parents 9f86e46 + 507dc3b commit 12ac553

31 files changed

+328
-148
lines changed

.env.ce.example

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
# Region where your Code Engine project lives (e.g. us-south, eu-de, au-syd)
1111
IBMCLOUD_REGION=us-south
1212

13-
# Resource group that owns the project (often default)
13+
# Resource group that owns the project (often "default")
1414
IBMCLOUD_RESOURCE_GROUP=default
1515

1616
# Code Engine project name

.github/workflows/bandit.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ jobs:
2727
bandit:
2828
permissions:
2929
contents: read # required by actions/checkout
30-
security-events: write # upload SARIF to Code scanning
30+
security-events: write # upload SARIF to "Code scanning"
3131
actions: read # needed only for private repos
3232

3333
runs-on: ubuntu-latest

.github/workflows/dependency-review.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
# • **Fails** when a change introduces either of the following:
88
# ↳ A vulnerability of severity ≥ MODERATE
99
# ↳ A dependency under a "strong-copyleft" license incompatible
10-
# with this projects Apache-2.0 license (see deny-list below)
10+
# with this project's Apache-2.0 license (see deny-list below)
1111
# • Uploads a SARIF report to "Security → Dependency review"
1212
# • Adds (or overwrites) a comment on the PR **only on failure**
1313
#
@@ -66,7 +66,7 @@ jobs:
6666
# ───────── License policy ─────────
6767
# Hard-deny strong- or service-copyleft licenses that would
6868
# "infect" an Apache-2.0 project. (LGPL/MPL/EPL are *not*
69-
# listed — theyre weak/file-level copyleft. Add them here
69+
# listed — they're weak/file-level copyleft. Add them here
7070
# if your org chooses to forbid them outright.)
7171
deny-licenses: >
7272
GPL-1.0, GPL-2.0, GPL-3.0,

.github/workflows/docker-image.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ on:
3434
permissions:
3535
contents: read
3636
packages: write # push to ghcr.io via GITHUB_TOKEN
37-
security-events: write # upload SARIF to Code scanning
37+
security-events: write # upload SARIF to "Code scanning"
3838
actions: read # needed by upload-sarif in private repos
3939
id-token: write # required for OIDC token generation
4040

.github/workflows/osv-scanner.yml.inactive

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,15 @@
77
# and fails only if the PR introduces *new* vulns.
88
# • **scan-scheduled** ─ Runs a full scan of the default branch
99
# on pushes & weekly cron to catch newly-published CVEs.
10-
# • Uploads SARIF results to Security → Code scanning.
10+
# • Uploads SARIF results to "Security → Code scanning".
1111
#
1212
# Action reference:
1313
# • Docs: https://google.github.io/osv-scanner/github-action/
1414
# • Repo: https://github.com/google/osv-scanner-action (Apache-2.0)
1515
#
1616
# Tips:
1717
# • Ignore a CVE by creating .osv-scanner.toml or using --ignore-vuln.
18-
# • Add --skip-git so the scan isnt cluttered with .git metadata.
18+
# • Add "--skip-git" so the scan isn't cluttered with .git metadata.
1919
# ===============================================================
2020

2121
name: OSV-Scanner

.github/workflows/python-package.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,9 @@ jobs:
3535
python -m pip install --upgrade pip
3636
python -m pip install build # PyPA-endorsed PEP 517 builder
3737
38-
# 4️⃣ Invoke the Makefile dist target (creates ./dist/*.whl & *.tar.gz)
38+
# 4️⃣ Invoke the Makefile 'dist' target (creates ./dist/*.whl & *.tar.gz)
3939
- name: Build distributions
40-
run: make dist # Uses the Makefiles `dist` rule
40+
run: make dist # Uses the Makefile's `dist` rule
4141

4242
# 5️⃣ Upload built artifacts so they can be downloaded from the run page
4343
- name: Upload distributions

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ venv.bak/
165165
.dmypy.json
166166
dmypy.json
167167

168-
# others
168+
# others
169169
.pdm-build
170170
.vscode
171171

.pre-commit-config.yaml

Lines changed: 92 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -30,23 +30,92 @@ repos:
3030
- repo: https://github.com/pre-commit/pre-commit-hooks
3131
rev: v5.0.0
3232
hooks:
33-
# - id: detect-aws-credentials
34-
# name: 🔐 Detect AWS Credentials
35-
# description: Detects *your* aws credentials from the aws cli credentials file.
36-
# types: [text]
33+
# - id: detect-aws-credentials
34+
# name: 🔐 Detect AWS Credentials
35+
# description: Detects *your* aws credentials from the aws cli credentials file.
36+
# types: [text]
3737

3838
- id: detect-private-key
3939
name: 🔐 Detect Private Key
4040
description: Detects the presence of private keys.
4141
types: [text]
4242

43-
# - repo: https://github.com/Yelp/detect-secrets
44-
# rev: v1.5.0
45-
# hooks:
46-
# - id: detect-secrets
47-
# name: 🔐 Detect Secrets
48-
# description: Detects secrets within a repository.
49-
# args: ['--baseline', '.secrets.baseline']
43+
# - repo: https://github.com/Yelp/detect-secrets
44+
# rev: v1.5.0
45+
# hooks:
46+
# - id: detect-secrets
47+
# name: 🔐 Detect Secrets
48+
# description: Detects secrets within a repository.
49+
# args: ['--baseline', '.secrets.baseline']
50+
51+
# -----------------------------------------------------------------------------
52+
# ❌ Forbid Specific AI / LLM Patterns
53+
# -----------------------------------------------------------------------------
54+
# This local hook checks for patterns that should not be committed.
55+
# It aims to detect and prevent the inclusion of AI-generated content by
56+
# identifying common artifacts associated with large language models (LLMs).
57+
#
58+
# Patterns checked include:
59+
# - `:contentReference`
60+
# - `[oaicite:??<digits>]` (e.g., `[oaicite:??12345]`)
61+
# - Common AI-generated phrases (e.g., "As an AI language model")
62+
# - Placeholder citations (e.g., "(Author, 2023)")
63+
# - Repetitive or generic phrases often produced by LLMs
64+
# -----------------------------------------------------------------------------
65+
- repo: local
66+
hooks:
67+
- id: forbid-specific-patterns
68+
name: ❌ Forbid Specific AI / LLM Patterns
69+
entry: >
70+
bash -c '
71+
# Succeed immediately if no files are passed
72+
[ "$#" -eq 0 ] && exit 0
73+
74+
# Invert grep exit-code:
75+
! grep -rnE "(:contentReference|\[oaicite:\?\?\d*\]|As an AI language model|I am an AI developed by|This response was generated by|\(Author, [0-9]{4}\)|\(Source: [^)]+\)|In conclusion,|To summarize,|It is important to note that|Remember that|Keep in mind that)" \
76+
--exclude-dir=.git \
77+
--exclude-dir=node_modules \
78+
--exclude-dir=.venv \
79+
--exclude-dir=dist \
80+
--exclude-dir=build \
81+
--exclude-dir=__pycache__ \
82+
--exclude=.pre-commit-config.yaml \
83+
"$@"
84+
'
85+
language: system
86+
pass_filenames: true
87+
types: [text]
88+
description: Prevents committing LLM artefacts like :contentReference, [oaicite], and common AI-generated phrases.
89+
90+
# -----------------------------------------------------------------------------
91+
# 🔤 Unicode Text Normalization (via texthooks)
92+
# -----------------------------------------------------------------------------
93+
# A collection of hooks to clean up problematic Unicode characters:
94+
#
95+
# 📝 fix-smartquotes: Converts curly quotes (" " ' ') to standard ASCII quotes.
96+
# 🔡 fix-ligatures: Replaces typographic ligatures (fi, ff) with ASCII equivalents.
97+
# ␣ fix-spaces: Normalizes non-breaking and exotic spaces to regular spaces.
98+
# 🚫 forbid-bidi-controls: Prevents Unicode BiDi control characters used to
99+
# obscure code logic or directionality.
100+
#
101+
# These prevent copy-paste artifacts, invisible formatting errors, and
102+
# encoding bugs from creeping into the codebase.
103+
# -----------------------------------------------------------------------------
104+
- repo: https://github.com/sirosen/texthooks
105+
rev: 0.6.8
106+
hooks:
107+
- id: fix-smartquotes
108+
name: 📝 Normalize Smart Quotes
109+
description: Replaces smart/curly quotes with standard ASCII quotes.
110+
- id: fix-ligatures
111+
name: 🔡 Normalize Ligatures
112+
description: Replaces typographic ligatures with standard characters.
113+
- id: fix-spaces
114+
name: ␣ Normalize Unicode Spaces
115+
description: Replaces non-breaking or exotic space characters with regular spaces.
116+
- id: forbid-bidi-controls
117+
name: 🚫 Forbid BiDi Unicode Controls
118+
description: Prevents bidirectional control characters that can obscure code meaning.
50119

51120
# -----------------------------------------------------------------------------
52121
# 🧹 Formatting Hooks (MODIFIES FILES)
@@ -73,7 +142,7 @@ repos:
73142

74143
- id: fix-encoding-pragma
75144
name: 🧹 Fix Python Encoding Pragma
76-
description: 'Adds # -*- coding: utf-8 -*- to the top of python files.'
145+
description: "Adds # -*- coding: utf-8 -*- to the top of python files."
77146
types: [python]
78147

79148
- id: mixed-line-ending
@@ -91,12 +160,12 @@ repos:
91160
name: 🧹 File Contents Sorter
92161
description: Sorts the lines in specified files (defaults to alphabetical).
93162
language: python
94-
files: '^$'
163+
files: "^$"
95164

96165
- id: sort-simple-yaml
97166
name: 🧹 Sort Simple YAML Files
98167
description: Sorts simple YAML files which consist only of top-level keys.
99-
files: '^$'
168+
files: "^$"
100169

101170
# Optional: Uncomment to enable Prettier formatting
102171
# - repo: https://github.com/pre-commit/mirrors-prettier
@@ -158,7 +227,7 @@ repos:
158227
name: ✅ Forbid Submodules
159228
description: Forbids any submodules in the repository.
160229
language: fail
161-
entry: 'submodules are not allowed in this repository:'
230+
entry: "submodules are not allowed in this repository:"
162231
types: [directory]
163232

164233
- id: check-vcs-permalinks
@@ -199,14 +268,14 @@ repos:
199268
- id: yamllint
200269
name: ✅ YAMLlint - YAML Linter
201270
description: A linter for YAML files.
202-
args: [ -c, .yamllint ]
271+
args: [-c, .yamllint]
203272

204-
# - repo: https://github.com/igorshubovych/markdownlint-cli
205-
# rev: v0.45.0
206-
# hooks:
207-
# - id: markdownlint
208-
# name: ✅ Markdownlint - Markdown Linter
209-
# description: A tool to check markdown files and flag style issues.
273+
# - repo: https://github.com/igorshubovych/markdownlint-cli
274+
# rev: v0.45.0
275+
# hooks:
276+
# - id: markdownlint
277+
# name: ✅ Markdownlint - Markdown Linter
278+
# description: A tool to check markdown files and flag style issues.
210279

211280
# -----------------------------------------------------------------------------
212281
# 🐍 Python Code Quality Hooks (LINTING ONLY)
@@ -246,7 +315,7 @@ repos:
246315
description: Verifies test files in tests/ directories start with `test_`.
247316
language: python
248317
files: (^|/)tests/.+\.py$
249-
args: [--pytest-test-first] # `test_.*\.py`
318+
args: [--pytest-test-first] # `test_.*\.py`
250319

251320
# - repo: https://github.com/pycqa/flake8
252321
# rev: 7.2.0

.pylintrc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -619,8 +619,8 @@ signature-mutators=
619619

620620
[VARIABLES]
621621

622-
# List of additional names supposed to be defined in builtins. Remember that
623-
# you should avoid defining new builtins when possible.
622+
# List of additional names supposed to be defined in builtins.
623+
# You should avoid defining new builtins when possible.
624624
additional-builtins=
625625

626626
# Tells whether unused global variables should be treated as a violation.

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ MCP Gateway builds on the MCP spec by sitting **in front of** MCP Server or REST
1313

1414
* **Act as a true gateway**, centralizing tool, resource and prompt registries while preserving the official MCP 2025-03-26 protocol
1515
* **Federate** multiple MCP servers into one unified endpoint—auto-discover peers (mDNS or explicit), health-check them, and merge their capabilities
16-
* **Virtualize** non-MCP services as virtual servers so you can register any REST API or function endpoint and expose it under MCP semantics
16+
* **Virtualize** non-MCP services as "virtual servers" so you can register any REST API or function endpoint and expose it under MCP semantics
1717
* **Adapt** arbitrary REST/HTTP APIs into MCP tools with JSON-Schema input validation, retry/rate-limit policies and transparent JSON-RPC invocation
1818
* **Simplify** deployments with a full admin UI, rich transports, pre-built DX pipelines and production-grade observability
1919

@@ -960,7 +960,7 @@ make lint # Run lint tools
960960
## API Documentation
961961
962962
* **Swagger UI** → [http://localhost:4444/docs](http://localhost:4444/docs)
963-
* **ReDoc**    → [http://localhost:4444/redoc](http://localhost:4444/redoc)
963+
* **ReDoc** → [http://localhost:4444/redoc](http://localhost:4444/redoc)
964964
* **Admin Panel** → [http://localhost:4444/admin](http://localhost:4444/admin)
965965
966966
---

0 commit comments

Comments
 (0)