Ignore a specific directory with HTML files to check #1909
-
|
Hi, my site contains generated content, which I cannot really influence, it's different every day, and I don't want to check the links there, because I can't even fix them. My specific use caseI automatically read a feed of some job postings from several job boards and display them in one section of my web. There's a ton of links out, and many suffer aggressive anti-scraping (thus also anti-lychee) protections, the jobs expire throughout the day, etc.I thought I could ignore those parts by My site is a static site, plain HTML, rendered in a My TOML looks like this, but I still get the output above: exclude_path = [
"/jobs/index\\.html$",
"/jobs/[^/]+/index\\.html$",
]I'd be grateful for any guidance. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 10 replies
-
|
You're very close. Excluded paths need to be specified from the root directory where you called lychee from: exclude_path = [
"public/jobs/index\\.html$",
"public/jobs/[^/]+/index\\.html$",
]That might be a little surprising. After all, you provided exclude_path = [
".*/jobs/index\\.html$",
".*/jobs/[^/]+/index\\.html$",
]Thanks for using lychee. |
Beta Was this translation helpful? Give feedback.
Ah that explains the issue 👍
I can recommend that you try the dedicated GitHub action instead of setting up lychee manually, though your approach of course is also fine.
So in summary, updating to lychee 0.21.0 fixes the problem and makes
exclude_pathwork as expected. Also, the regular expressions forexclude_pathdo not have to match the full path, they just have to produce a match. So the docs are accurate and up to date. It probably was different in the past, e.g. with 0.18.1.