Skip to content

Allow consumer control of crawled paths #141

@jahredhope

Description

@jahredhope

Background

Currently there is a feature to crawl through rendered html for additional links.
This is passed in as a boolean.
An issue #127 was opened asking for the ability to disallow some domains to be rendered.

Change

We could change the crawl behaviour from a boolean to an optional function the consumer can pass in to decide whether a link should be rendered.

Option 1: On all HTML
Crawl function could get called once a render is complete.
It would be responsible for looking for all links on the page and returning an array of new pages to render.

Optionally we could add a getHrefsFromHtml convenience function to save each consumer writing this parser.

import StaticSiteGeneratorPlugin, { getHrefsFromHtml } from 'static-site-generator-webpack-plugin';

new StaticSiteGeneratorPlugin({
  crawl = ({html}) => getHrefsFromHtml(html)
    .filter(href => !href.includes('bad.com')
});

Option 2: On each link
Crawl function could get called after we've parsed the rendered HTML for links.

import StaticSiteGeneratorPlugin from 'static-site-generator-webpack-plugin';

new StaticSiteGeneratorPlugin({
  crawl = ({href, html, index}) => !href.includes('bad.com')
});

Feedback

Feedback is welcome. Please comment below with your thoughts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions