diff --git a/astro.config.mjs b/astro.config.mjs index 544ca53..6e5c7e5 100644 --- a/astro.config.mjs +++ b/astro.config.mjs @@ -65,6 +65,7 @@ export default defineConfig({ "recipes/caching", "recipes/excluding-links", "recipes/excluding-paths", + "recipes/local-folder", "recipes/migration", "recipes/base-url", "recipes/root-dir", diff --git a/src/content/docs/recipes/local-folder.mdx b/src/content/docs/recipes/local-folder.mdx new file mode 100644 index 0000000..cdbdf64 --- /dev/null +++ b/src/content/docs/recipes/local-folder.mdx @@ -0,0 +1,185 @@ +--- +title: Checking a Local Folder with URL Remapping +description: Checking a local folder of HTML files which will be uploaded to a particular URL. +--- +{/* vim: set syntax=markdown: */} + +import { Code } from "@astrojs/starlight/components"; + +Often, you will want to check a local folder of HTML files before the folder +gets uploaded to a website (as part of a static site workflow, for example). +Sometimes, this can be complicated {/* verb */} if your local files use fully-qualified +URLs which point to _future_ online locations of the local files. + +For instance, suppose you write a new blog post which will be uploaded to +`https://example.com/docs/2025-01-01-post.html`. You might use that URL in +certain places (like permalinks and canonical links), even though the URL +doesn't exist _yet_. +My blog post +Permalink`} + lang="html" + title="docs/2025-01-01-post.html" +/> + +This can cause problems for link checking, because lychee would check these +links against the currently-online version of the site—this could be outdated or +missing newly-added files. To solve this problem, we can tell lychee that +certain online URLs should be *mapped* to local folder paths. + +This works by mapping the content's future URLs to local files on your +computer, using lychee's URL remapping feature. For links to these URLs, lychee +will check that the corresponding files exist inside the local directory, +rather than checking the online website. + +:::tip[Please give feedback!] +This page covers a fairly complicated topic, so feedback is appreciated! If +something is unclear or not working as you expect, please let us know. You +can open an issue or discussion for [this docs +website](https://github.com/lycheeverse/lycheeverse.github.io) or [lychee +itself](https://github.com/lycheeverse/lychee). +::: + +:::note[Limitations] +This guide uses lychee's URL remapping feature. This is based on regular +expressions and has certain limitations, see [Limitations](#limitations). +::: + +## Do You Need URL Remapping? + +In simple cases, you don't! + +By default, lychee can already resolve relative links to adjacent local files. +By adding [`--root-dir`][root-dir], lychee can also resolve root-relative links +(beginning with `/`) to the given root directory. In simple cases, this is all +you need. + +Continue reading if: +- you have fully-qualified links to files which exist locally but aren't online yet, or +- your local folder will be uploaded to a _subdirectory_ of the website domain. + +[root-dir]: /recipes/root-dir/ + +## Mapping Remote Domain to a Local Folder + +Suppose you have a local directory `out` and this will be uploaded to +the domain *root* at `https://docs.example.com`. + +You can map URLs beginning with this domain into the local directory: +```bash +lychee ./out --root-dir ./out --remap "^https://docs\.example\.com file://$(pwd)/out" +``` +This will remap URLs so `https://docs.example.com/page.html` becomes +`./out/page.html`, for example. + + +## Mapping a Remote Subfolder to a Local Folder + +If, instead, your local folder will be uploaded to a _subdirectory_ of the website +(rather than the domain root), you will need some more set up. + +Suppose that the local directory `out` will be uploaded to a subfolder at +`https://example.com/docs/`. + +:::tip +Try the "Simple Case" first, even if you're not sure which case to use. If all +links check successfully, then it's all good! Otherwise, if you see "not found" +errors or "root dir" errors, move on to [More Complex +Cases](#more-complex-cases). +::: + +### Simple Case ("Portable" Websites) + +If your website files are _portable_, then you can use a simple setup +akin to the mapping whole domain case. + +Portable means that the local folder could be uploaded to any path on any +domain and all its pages would work correctly. This is common for HTML files +generated by a documentation generator such as Doxygen or Javadoc. + +As a guide, a local folder is likely to be portable if: +- the local folder contains all needed resources (e.g., CSS, JS, images), and +- the local HTML files _do not_ use root-relative links (beginning with `/`). + +In this simple case, you can use: +```bash +lychee ./out --remap "^https://example\.com/docs file://$(pwd)/out" +``` +This remaps remote URLs within the `/docs` subpath into the local folder. +`--root-dir` is intentionally omitted because root-relative links cannot work +correctly without more setup—see below if you need root directory support. + +### More Complex Cases + +To make `--root-dir` work in this context, your folder structure has to mimic +the structure of the remote website. We can make a "temporary root dir" which +has the right structure and sits next to the original local folder. In this +way, we avoid needing to change our existing local folder structure, and we can +use symbolic links to point to the existing files. +``` +├── out +│ └── page.html +└── temp-root-dir + └── docs -> ../out +``` +```bash +mkdir temp-root-dir +ln -s ../out temp-root-dir/docs +``` + +Additionally, since the local folder is only a subset of the website, certain +relative links should be treated as links to the online website (for example, +the root link `/`). In effect, this means that paths inside `./temp-root-dir` +but outside of `./temp-root-dir/docs` must be redirected to the online website. + +Putting it all together, the lychee command looks like this: +```bash +lychee ./out \ + --root-dir ./temp-root-dir \ + --remap "^https://example\.com/docs file://$(pwd)/out" \ + --remap "file://$(pwd)/temp-root-dir/docs file://$(pwd)/temp-root-dir/docs" \ + --remap "file://$(pwd)/temp-root-dir https://example\.com" +``` +Note that the order of remaps is significant—earlier remaps are tried +first and have priority over later ones. + +## Limitations + +- Remaps are applied textually. As an example, the remap + ``` + --remap "^https://example\.com/docs file://$(pwd)/out" + ``` + applies to any URL _beginning_ with that string even if it's inside a different subfolder. + For instance, it would also apply to a URL of `https://example.com/docs-2/page`. + + If you need to guard against this, you can change the regex to end with + `([?#/]|$)` and add `$1` to the replacement, like so: + ``` + --remap "^https://example\.com/docs([?#/]|$) file://$(pwd)/out\$1" + ``` + Note that the `$1` must be escaped to avoid being treated as a shell + variable. + +- Remap patterns are regular expressions, so many common URL symbols + should be escaped to avoid being treated as regex metacharacters + (including `.?$+` and brackets). For example, the remaps in this + page use `\.` in domain names. + +- If you are using remaps for multiple purposes, be aware of potential + conflicts between them. For each URL, remaps are tried in order and the + *first* matching remap will be applied. + +## See Also + +If your URLs make use of automatic index files or automatic file extensions, see +[Pretty URLs](/recipes/pretty-urls/) to enable the same features for local +files. + +This documentation page was motivated by certain issue reports +([#1918](https://github.com/lycheeverse/lychee/issues/1918), +[#1594](https://github.com/lycheeverse/lychee/issues/1594)). +In particular, the UX/documentation issue was discussed in +[#1718](https://github.com/lycheeverse/lychee/issues/1718). +These links are for historical background and might not reflect +the current version of lychee. +