Skip to content

Commit fbae978

Browse files
committed
first shot at adding hook!
it seems to be required that a hook is a release, so I am developing this in a separate repo, which makes sense because I would not want to change the default urlchecker client just for this, but we can maintain this modularly here! Signed-off-by: vsoch <[email protected]>
0 parents  commit fbae978

File tree

6 files changed

+381
-0
lines changed

6 files changed

+381
-0
lines changed

.pre-commit-hooks.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
- id: urlchecker
2+
name: urlchecker
3+
description: Look for broken URLs in your static files
4+
entry: urlchecker
5+
language: python
6+
language_version: python3
7+
files: '\.(rst|md|markdown|py|tex)$'

LICENSE

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
Copyright (c) 2022 Vanessa Sochat and Ayoub Malek
2+
3+
4+
Permission is hereby granted, free of charge, to any person obtaining a copy
5+
of this software and associated documentation files (the "Software"), to deal
6+
in the Software without restriction, including without limitation the rights
7+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8+
copies of the Software, and to permit persons to whom the Software is
9+
furnished to do so, subject to the following conditions:
10+
11+
The above copyright notice and this permission notice shall be included in
12+
all copies or substantial portions of the Software.
13+
14+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
20+
THE SOFTWARE.

README.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
<div style="text-align:center"><img src="https://raw.githubusercontent.com/urlstechie/urlchecker-python/master/docs/urlstechie.png"/></div>
2+
3+
# urlchecker pre-commit
4+
5+
You can use urlchecker-python with [pre-commit](https://pre-commit.com/)!
6+
7+
## Setup
8+
9+
Add the following entry to your `.pre-commit-config.yaml` in the root of
10+
your repository:
11+
12+
```yaml
13+
repos:
14+
- repo: https://github.com/urlstechie/pre-commit
15+
rev: 0.0.0
16+
hooks:
17+
- id: urlchecker-check
18+
additional_dependencies: [urlchecker>=0.0.29]
19+
```
20+
21+
You can add additional args (those you would add to the check command) to further
22+
customize the run:
23+
24+
25+
```yaml
26+
repos:
27+
- repo: https://github.com/urlstechie/pre-commit
28+
rev: 0.0.0
29+
hooks:
30+
- id: urlchecker-check
31+
additional_dependencies: [urlchecker>=0.0.29]
32+
```
33+
34+
Note that the `--files` argument that previously accepted patterns for urlchecker
35+
for this module is instead `--patterns`. The reason is because pre-commit is already
36+
going to provide a list of filenames to check verbatim with the commit, and your
37+
additional specification of `--patterns` is primarily to further filter this list.
38+
39+
## Run
40+
41+
And then you can run:
42+
43+
```bash
44+
$ pre-commit run
45+
```
46+
47+
**under development**
48+
49+
## Support
50+
51+
If you need help, or want to suggest a project for the organization,
52+
please [open an issue](https://github.com/urlstechie/pre-commit)

setup.cfg

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
[metadata]
2+
name = urlchecker_check
3+
version = 0.0.0
4+
description = Run urlchecker to check urls in your static files
5+
long_description = file: README.md
6+
long_description_content_type = text/markdown
7+
url = https://github.com/urlchecker/pre-commit
8+
author = Vanessa Sochat
9+
author_email = [email protected]
10+
license = MIT
11+
license_file = LICENSE
12+
classifiers =
13+
License :: OSI Approved :: MIT License
14+
Programming Language :: Python :: 3
15+
Programming Language :: Python :: 3 :: Only
16+
Programming Language :: Python :: 3.7
17+
Programming Language :: Python :: 3.8
18+
Programming Language :: Python :: 3.9
19+
Programming Language :: Python :: 3.10
20+
Programming Language :: Python :: Implementation :: CPython
21+
Programming Language :: Python :: Implementation :: PyPy
22+
23+
[options]
24+
py_modules = urlchecker_check
25+
install_requires =
26+
urlchecker>=0.0.29
27+
python_requires = >=3.7
28+
29+
[options.entry_points]
30+
console_scripts =
31+
urlchecker-check=urlchecker_check:main
32+
33+
[bdist_wheel]
34+
universal = True

setup.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
from __future__ import annotations
2+
3+
from setuptools import setup
4+
5+
setup()

urlchecker_check.py

Lines changed: 263 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,263 @@
1+
from __future__ import annotations
2+
3+
# Copyright (c) 2022 Vanessa Sochat and Ayoub Malek
4+
# This source code is licensed under the terms of the MIT license.
5+
# For a copy, see <https://opensource.org/licenses/MIT>.
6+
7+
import argparse
8+
import re
9+
import os
10+
import sys
11+
import logging
12+
13+
from urlchecker.main.github import clone_repo, delete_repo
14+
from urlchecker.core.fileproc import remove_empty
15+
from urlchecker.core.check import UrlChecker
16+
from urlchecker.logger import print_failure
17+
18+
logger = logging.getLogger("urlchecker")
19+
20+
21+
def get_parser():
22+
# Flatten parser to just be check command
23+
parser = argparse.ArgumentParser(description="urlchecker python pre-commit")
24+
parser.add_argument(
25+
"path",
26+
help="the local path or GitHub repository to clone and check",
27+
)
28+
29+
parser.add_argument(
30+
"-b",
31+
"--branch",
32+
help="if cloning, specify a branch to use (defaults to main)",
33+
default="main",
34+
)
35+
36+
parser.add_argument(
37+
"--subfolder",
38+
help="relative subfolder path within path (if not specified, we use root)",
39+
)
40+
41+
parser.add_argument(
42+
"--cleanup",
43+
help="remove root folder after checking (defaults to False, no cleaup)",
44+
default=False,
45+
action="store_true",
46+
)
47+
48+
parser.add_argument(
49+
"--force-pass",
50+
help="force successful pass (return code 0) regardless of result",
51+
default=False,
52+
action="store_true",
53+
)
54+
55+
parser.add_argument(
56+
"--no-print",
57+
help="Skip printing results to the screen (defaults to printing to console).",
58+
default=False,
59+
action="store_true",
60+
)
61+
62+
parser.add_argument(
63+
"--verbose",
64+
help="Print file names for failed urls in addition to the urls.",
65+
default=False,
66+
action="store_true",
67+
)
68+
69+
parser.add_argument(
70+
"--file-types",
71+
dest="file_types",
72+
help="comma separated list of file extensions to check (defaults to .md,.py)",
73+
default=".md,.py",
74+
)
75+
76+
# Here we separate out filenames (provided by pre-commit) and extra patterns
77+
# to filter over (--patterns) which is --files in the urlchecker
78+
parser.add_argument("filenames", nargs="*")
79+
parser.add_argument(
80+
"--patterns",
81+
dest="patterns",
82+
help="patterns to check.",
83+
default="",
84+
)
85+
86+
parser.add_argument(
87+
"--exclude-urls",
88+
help="comma separated links to exclude (no spaces)",
89+
default="",
90+
)
91+
92+
parser.add_argument(
93+
"--exclude-patterns",
94+
help="comma separated list of patterns to exclude (no spaces)",
95+
default="",
96+
)
97+
98+
parser.add_argument(
99+
"--exclude-files",
100+
help="comma separated list of files and patterns to exclude (no spaces)",
101+
default="",
102+
)
103+
104+
# Saving
105+
106+
parser.add_argument(
107+
"--save",
108+
help="Path to a csv file to save results to.",
109+
default=None,
110+
)
111+
112+
# Timeouts
113+
114+
parser.add_argument(
115+
"--retry-count",
116+
help="retry count upon failure (defaults to 2, one retry).",
117+
type=int,
118+
default=2,
119+
)
120+
121+
parser.add_argument(
122+
"--timeout",
123+
help="timeout (seconds) to provide to the requests library (defaults to 5)",
124+
type=int,
125+
default=5,
126+
)
127+
return parser
128+
129+
130+
def check(args):
131+
"""
132+
Main entrypoint for running a check. We expect an args object with
133+
arguments from the main client. From here we determine the path
134+
to parse (or GitHub url to clone) and call the main check function
135+
under main/check.py
136+
137+
Args:
138+
- args : the argparse ArgParser with parsed args
139+
- extra : extra arguments not handled by the parser
140+
"""
141+
path = args.path
142+
143+
# Case 1: specify present working directory
144+
if not path or path == ".":
145+
path = os.getcwd()
146+
147+
# Case 2: git clone isn't supported for a pre-commit hook
148+
elif re.search("^(git@|http)", path):
149+
logging.error("Repository url %s detected, not supported for pre-commit hook.")
150+
return 1
151+
152+
# Add subfolder to path
153+
if args.subfolder:
154+
path = os.path.join(path, args.subfolder)
155+
156+
# By the time we get here, a path must exist
157+
if not os.path.exists(path):
158+
logger.error("Error %s does not exist." % path)
159+
return 1
160+
161+
logging.debug("Path specified as present working directory, %s" % path)
162+
163+
# Parse file types, and excluded urls and files (includes absolute and patterns)
164+
file_types = args.file_types.split(",")
165+
exclude_urls = remove_empty(args.exclude_urls.split(","))
166+
exclude_patterns = remove_empty(args.exclude_patterns.split(","))
167+
exclude_files = remove_empty(args.exclude_files.split(","))
168+
169+
# Do we have any patterns to filter (regular expressions)?
170+
patterns = None
171+
if args.patterns:
172+
logger.debug("Found patterns of files to filter to.")
173+
patterns = "(%s)" % "|".join(args.patterns)
174+
175+
# Process the files
176+
files = []
177+
for filename in args.filenames:
178+
if not filename or not os.path.exists(filename):
179+
logger.error("%s does not exist, skipping." % filename)
180+
continue
181+
if patterns and not re.search(patterns, filename):
182+
continue
183+
files.append(filename)
184+
185+
# Alert user about settings
186+
print(" original path: %s" % args.path)
187+
print(" final path: %s" % path)
188+
print(" subfolder: %s" % args.subfolder)
189+
print(" branch: %s" % args.branch)
190+
print(" cleanup: %s" % args.cleanup)
191+
print(" file types: %s" % file_types)
192+
print(" files: %s" % files)
193+
print(" print all: %s" % (not args.no_print))
194+
print(" verbose: %s" % (args.verbose))
195+
print(" urls excluded: %s" % exclude_urls)
196+
print(" url patterns excluded: %s" % exclude_patterns)
197+
print(" file patterns excluded: %s" % exclude_files)
198+
print(" force pass: %s" % args.force_pass)
199+
print(" retry count: %s" % args.retry_count)
200+
print(" save: %s" % args.save)
201+
print(" timeout: %s" % args.timeout)
202+
203+
# Instantiate a new checker with provided arguments
204+
checker = UrlChecker(
205+
path=path,
206+
file_types=file_types,
207+
include_patterns=files,
208+
exclude_files=exclude_files,
209+
print_all=not args.no_print,
210+
)
211+
check_results = checker.run(
212+
exclude_urls=exclude_urls,
213+
exclude_patterns=exclude_patterns,
214+
retry_count=args.retry_count,
215+
timeout=args.timeout,
216+
)
217+
218+
# save results to file, if save indicated
219+
if args.save:
220+
checker.save_results(args.save)
221+
222+
# Case 1: We didn't find any urls to check
223+
if not check_results["failed"] and not check_results["passed"]:
224+
print("\n\n\U0001F937. No urls were collected.")
225+
return 0
226+
227+
# Case 2: We had errors, print them for the user
228+
if check_results["failed"]:
229+
if args.verbose:
230+
print("\n\U0001F914 Uh oh... The following urls did not pass:")
231+
for file_name, result in checker.checks.items():
232+
if result["failed"]:
233+
print_failure(file_name + ":")
234+
for url in result["failed"]:
235+
print_failure(" " + url)
236+
else:
237+
print("\n\U0001F914 Uh oh... The following urls did not pass:")
238+
for failed_url in check_results["failed"]:
239+
print_failure(failed_url)
240+
241+
# If we have failures and it's not a force pass, exit with 1
242+
if not args.force_pass and check_results["failed"]:
243+
return 1
244+
245+
# Finally, alert user if we are passing conditionally
246+
if check_results["failed"]:
247+
print("\n\U0001F928 Conditional pass force pass True.")
248+
else:
249+
print("\n\n\U0001F389 All URLS passed!")
250+
return 0
251+
252+
253+
def main(argv: Sequence[str] | None = None) -> int:
254+
255+
parser = get_parser()
256+
args = parser.parse_args(argv)
257+
258+
# Get the return value to return to pre-commit
259+
return check(args)
260+
261+
262+
if __name__ == "__main__":
263+
raise SystemExit(main())

0 commit comments

Comments
 (0)