Fix asset resolution when baseUrl points to a different origin

When `baseUrl` is set, the intention is to:

* keep crawling and asset loading tied to the origin of `initialDocURLs` (e.g. a local/static build or staging site), while
* rewriting only the **hyperlinks** in the final PDF so they point at the canonical production URL (`baseUrl`).

At the moment, this doesn’t quite happen.

When `baseUrl` is provided and its origin differs from the origin of `initialDocURLs`, the final PDF generation attempts to load images and other assets from `baseUrl` instead of from the crawl source.

This is because [`concatHtml()`](https://github.com/jean-humann/docs-to-pdf/blob/65d47bf240cca5c0b92c018038129d3bd4a1ab85/src/utils.ts#L365) inserts:

```html
<base href="https://example.com" />
```

The browser applies the `<base>` tag to *all* relative URLs, not just hyperlinks. As a result, during the [final render pass](https://github.com/jean-humann/docs-to-pdf/blob/65d47bf240cca5c0b92c018038129d3bd4a1ab85/src/core.ts#L256) Puppeteer resolves every relative asset URL (images, stylesheets, scripts, etc.) against `baseUrl`.


## Example

Suppose:

* You are crawling docs from a local Docusaurus build:
  `initialDocURLs = ["http://localhost:3000/docs/intro"]`
* Your canonical production site is:
  `baseUrl = "https://docs.example.com"`

Your HTML contains relative asset paths:

```html
<img src="/img/logo.png" />
<link rel="stylesheet" href="/assets/styles.css" />
<a href="/guide">Read more</a>
```

Because `concatHtml()` adds:

```html
<base href="https://docs.example.com" />
```

the browser now rewrites all relative URLs as:

| Original             | Resolved under `<base>`                      | Intended                                  |
| -------------------- | -------------------------------------------- | ----------------------------------------- |
| `/img/logo.png`      | `https://docs.example.com/img/logo.png`      | `http://localhost:3000/img/logo.png`      |
| `/assets/styles.css` | `https://docs.example.com/assets/styles.css` | `http://localhost:3000/assets/styles.css` |
| link `/guide`        | `https://docs.example.com/guide`             | **This one *should* use baseUrl**         |

This causes asset failures whenever:

* content must be loaded from a local or static build,
* the preview/staging environment differs from the canonical URL, or
* the canonical hostname is not resolvable from the execution environment.


## Expected behaviour

* **Assets** (images, CSS, JS) should load from the crawl origin:
  `http://localhost:3000/...`
* **Hyperlinks** inside the PDF should still use the canonical `baseUrl`:
  `https://docs.example.com/...`


## Proposed fix

In `generatePDF()`, extend the existing request interception logic:

1. Let Puppeteer build the request URL (which may incorrectly resolve under `baseUrl`).
2. Check if that resolved URL starts with the `baseUrl` origin.
3. If so, rewrite the origin to match `initialDocURLs[0]`.

Example rewrite:

```
From: https://docs.example.com/img/logo.png  
To:   http://localhost:3000/img/logo.png
```

This keeps asset loading tied to the crawl source while preserving canonical hyperlink rewriting.


## Backward compatibility

The change proposed is backward compatible for all cases where:

* `baseUrl` is not provided, or
* `baseUrl` shares the same origin as `initialDocURLs`.

Original	Resolved under `<base>`	Intended
`/img/logo.png`	`https://docs.example.com/img/logo.png`	`http://localhost:3000/img/logo.png`
`/assets/styles.css`	`https://docs.example.com/assets/styles.css`	`http://localhost:3000/assets/styles.css`
link `/guide`	`https://docs.example.com/guide`	*This one should* use baseUrl**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix asset resolution when baseUrl points to a different origin #558

Example

Expected behaviour

Proposed fix

Backward compatibility

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Fix asset resolution when baseUrl points to a different origin #558

Description

Example

Expected behaviour

Proposed fix

Backward compatibility

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions