Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions astro.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ export default defineConfig({
"recipes/caching",
"recipes/excluding-links",
"recipes/excluding-paths",
"recipes/local-folder",
"recipes/migration",
"recipes/base-url",
"recipes/root-dir",
Expand Down
185 changes: 185 additions & 0 deletions src/content/docs/recipes/local-folder.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
---
title: Checking a Local Folder with URL Remapping
description: Checking a local folder of HTML files which will be uploaded to a particular URL.
---
{/* vim: set syntax=markdown: */}

import { Code } from "@astrojs/starlight/components";

Often, you will want to check a local folder of HTML files before the folder
gets uploaded to a website (as part of a static site workflow, for example).
Sometimes, this can be complicated {/* verb */} if your local files use fully-qualified
URLs which point to _future_ online locations of the local files.

For instance, suppose you write a new blog post which will be uploaded to
`https://example.com/docs/2025-01-01-post.html`. You might use that URL in
certain places (like permalinks and canonical links), even though the URL
doesn't exist _yet_.
<Code
code={`<h1>My blog post</h1>
<a href="https://example.com/docs/2025-01-01-post.html">Permalink</a>`}
lang="html"
title="docs/2025-01-01-post.html"
/>

This can cause problems for link checking, because lychee would check these
links against the currently-online version of the site&mdash;this could be outdated or
missing newly-added files. To solve this problem, we can tell lychee that
certain online URLs should be *mapped* to local folder paths.

This works by mapping the content's future URLs to local files on your
computer, using lychee's URL remapping feature. For links to these URLs, lychee
will check that the corresponding files exist inside the local directory,
rather than checking the online website.

:::tip[Please give feedback!]
This page covers a fairly complicated topic, so feedback is appreciated! If
something is unclear or not working as you expect, please let us know. You
can open an issue or discussion for [this docs
website](https://github.com/lycheeverse/lycheeverse.github.io) or [lychee
itself](https://github.com/lycheeverse/lychee).
:::

:::note[Limitations]
This guide uses lychee's URL remapping feature. This is based on regular
expressions and has certain limitations, see [Limitations](#limitations).
:::

## Do You Need URL Remapping?

In simple cases, you don't!

By default, lychee can already resolve relative links to adjacent local files.
By adding [`--root-dir`][root-dir], lychee can also resolve root-relative links
(beginning with `/`) to the given root directory. In simple cases, this is all
you need.

Continue reading if:
- you have fully-qualified links to files which exist locally but aren't online yet, or
- your local folder will be uploaded to a _subdirectory_ of the website domain.

[root-dir]: /recipes/root-dir/

## Mapping Remote Domain to a Local Folder

Suppose you have a local directory `out` and this will be uploaded to
the domain *root* at `https://docs.example.com`.

You can map URLs beginning with this domain into the local directory:
```bash
lychee ./out --root-dir ./out --remap "^https://docs\.example\.com file://$(pwd)/out"
```
This will remap URLs so `https://docs.example.com/page.html` becomes
`./out/page.html`, for example.


## Mapping a Remote Subfolder to a Local Folder

If, instead, your local folder will be uploaded to a _subdirectory_ of the website
(rather than the domain root), you will need some more set up.

Suppose that the local directory `out` will be uploaded to a subfolder at
`https://example.com/docs/`.

:::tip
Try the "Simple Case" first, even if you're not sure which case to use. If all
links check successfully, then it's all good! Otherwise, if you see "not found"
errors or "root dir" errors, move on to [More Complex
Cases](#more-complex-cases).
:::

### Simple Case ("Portable" Websites)

If your website files are _portable_, then you can use a simple setup
akin to the mapping whole domain case.

Portable means that the local folder could be uploaded to any path on any
domain and all its pages would work correctly. This is common for HTML files
generated by a documentation generator such as Doxygen or Javadoc.

As a guide, a local folder is likely to be portable if:
- the local folder contains all needed resources (e.g., CSS, JS, images), and
- the local HTML files _do not_ use root-relative links (beginning with `/`).

In this simple case, you can use:
```bash
lychee ./out --remap "^https://example\.com/docs file://$(pwd)/out"
```
This remaps remote URLs within the `/docs` subpath into the local folder.
`--root-dir` is intentionally omitted because root-relative links cannot work
correctly without more setup&mdash;see below if you need root directory support.

### More Complex Cases

To make `--root-dir` work in this context, your folder structure has to mimic
the structure of the remote website. We can make a "temporary root dir" which
has the right structure and sits next to the original local folder. In this
way, we avoid needing to change our existing local folder structure, and we can
use symbolic links to point to the existing files.
```
├── out
│ └── page.html
└── temp-root-dir
└── docs -> ../out
```
```bash
mkdir temp-root-dir
ln -s ../out temp-root-dir/docs
```

Additionally, since the local folder is only a subset of the website, certain
relative links should be treated as links to the online website (for example,
the root link `/`). In effect, this means that paths inside `./temp-root-dir`
but outside of `./temp-root-dir/docs` must be redirected to the online website.

Putting it all together, the lychee command looks like this:
```bash
lychee ./out \
--root-dir ./temp-root-dir \
--remap "^https://example\.com/docs file://$(pwd)/out" \
--remap "file://$(pwd)/temp-root-dir/docs file://$(pwd)/temp-root-dir/docs" \
--remap "file://$(pwd)/temp-root-dir https://example\.com"
```
Note that the order of remaps is significant&mdash;earlier remaps are tried
first and have priority over later ones.

## Limitations

- Remaps are applied textually. As an example, the remap
```
--remap "^https://example\.com/docs file://$(pwd)/out"
```
applies to any URL _beginning_ with that string even if it's inside a different subfolder.
For instance, it would also apply to a URL of `https://example.com/docs-2/page`.

If you need to guard against this, you can change the regex to end with
`([?#/]|$)` and add `$1` to the replacement, like so:
```
--remap "^https://example\.com/docs([?#/]|$) file://$(pwd)/out\$1"
```
Note that the `$1` must be escaped to avoid being treated as a shell
variable.

- Remap patterns are regular expressions, so many common URL symbols
should be escaped to avoid being treated as regex metacharacters
(including `.?$+` and brackets). For example, the remaps in this
page use `\.` in domain names.

- If you are using remaps for multiple purposes, be aware of potential
conflicts between them. For each URL, remaps are tried in order and the
*first* matching remap will be applied.

## See Also

If your URLs make use of automatic index files or automatic file extensions, see
[Pretty URLs](/recipes/pretty-urls/) to enable the same features for local
files.

This documentation page was motivated by certain issue reports
([#1918](https://github.com/lycheeverse/lychee/issues/1918),
[#1594](https://github.com/lycheeverse/lychee/issues/1594)).
In particular, the UX/documentation issue was discussed in
[#1718](https://github.com/lycheeverse/lychee/issues/1718).
These links are for historical background and might not reflect
the current version of lychee.