App bundle size inflated by matchPath data on 25k page site #32742
Replies: 34 comments
-
Hiya! This issue has gone quiet. Spooky quiet. 👻 We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here. Thanks for being a part of the Gatsby community! 💪💜 |
Beta Was this translation helpful? Give feedback.
-
Hey again! It’s been 30 days since anything happened on this issue, so our friendly neighborhood robot (that’s me!) is going to close it. Thanks again for being part of the Gatsby community! 💪💜 |
Beta Was this translation helpful? Give feedback.
-
Hey sorry this never got responded to. Could you create a small reproduction of the problem? That'd be the best next step towards investigating the bug. |
Beta Was this translation helpful? Give feedback.
-
Sorry for the late reply on this @KyleAMathews I have created a codesandbox with my approach here. The app.js file is here. As you can see there are unneccessary |
Beta Was this translation helpful? Give feedback.
-
Hi @wardpeet, sorry for mentioning you directly but I think the bit of code which is causing me an issue here is from one of your PRs - #17412. If I comment out the following code, the
I understand that code is there for a reason as it was part of the fix for #16097, but I just want to get some understanding of whether the solution should be on my end or yours. Our app has a requirement to have static pages which can have client-only sub-routes, which is why we have pages created through Correct me if I'm wrong - I think that code checks all pages to see if their URL matches any of the Any help would be greatly appreciated. |
Beta Was this translation helpful? Give feedback.
-
@KyleAMathews @wardpeet - as far as I can tell, we should be able to trust that Gatsby has generated all the required assets for statically generated pages (their index.html, page-data.json, etc) so the fix (#17412) for #16097 shouldn't need to include static pages in |
Beta Was this translation helpful? Give feedback.
-
Let me sketch the problem here a bit. When gatsby loads on page, the next navigation will be a client side navigation. So, no page refresh. We try to load a page-data.json file for each URL. If it doesn't exist it will return a 404. When people use matchpaths, this fails and we need to know what the rootPath is. That's why matchpaths exists. If we kept doing a page-data fetch that could result in many 404s which wouldn't be ideal. We had this logic for a while but a lot of complains happened so that's why we reverted to this pattern. We're probably able to improve the algorithm a bit. |
Beta Was this translation helpful? Give feedback.
-
Hi @wardpeet, also suffering from this issue. I would like to put a few select client only routes on the index (e.g. /pay), but also have plenty of static pages (e.g. /shiny-new-toy, /buy-buy-buy). Gatsby understands those static pages exist and can be routed to, but the number of the static pages has brought back the "page manifest problem": [
{
"path": "/buy-buy-buy",
"matchPath": "/buy-buy-buy"
}, //x 10's of thousands of these static page mappings
{
"path": "/",
"matchPath": "/*"
}
] With a finite amount of client only pages, could the problem not be avoided by permitting a simple mapping alongside the globs that are currently supported, i.e. //gatsby-node.js
exports.onCreatePage = async ({ page, actions }) => {
const { createPage } = actions
if (page.path === "/") {
page.matchPaths = ["/pay", "myotherclientsideroute"]
createPage(page)
}
} Which then produces [
{
"path": "/",
"matchPaths": ["/pay", "myotherclientsideroute"]
}
] And then changing |
Beta Was this translation helpful? Give feedback.
-
I also bumped into this problem having 25k+ pages. As @ConorLindsay94 suggested commenting out the mentioned block of code solved it to me too. Update [2020-10-06]: No, this move has fixed my issues only partially. In the end, I rollbacked to 2. 15.22 (Sep. 2019 version) to make localizations work without having huge match-paths.json. |
Beta Was this translation helpful? Give feedback.
-
@KyleAMathews @wardpeet is this something that you would be happy having opensource contributions for? Or is it something you're already working on? If we were to contribute, are there any specific areas to look out for, or has our (@ConorLindsay94) initial work highlighted the main area? |
Beta Was this translation helpful? Give feedback.
-
@pieh @wardpeet we have noticed that as a result of changes in #25057 that the |
Beta Was this translation helpful? Give feedback.
-
Hmmm, this PR shouldn't have such impact - what this PR should have done is just change from using |
Beta Was this translation helpful? Give feedback.
-
Using default it seems like I wonder - this might be result of changes to webpack chunking setup maybe that we done some time ago? If that would be the case then Other option is that there is regression, but it's conditional - there might be some specific setup that cause this - removing |
Beta Was this translation helpful? Give feedback.
-
@pieh apologies, it was a false positive (true negative for this issue, I suppose). I have run various tests and it does seem that webpack is now putting I've run tests using different
When the array gets to a certain size, webpack is creating an independent chunk for the It seems the change in webpack behavior is between 2.23.20 and 2.23.21, which is where In fact, now having multiple large |
Beta Was this translation helpful? Give feedback.
-
Ok, so this seems like combination of things here. Introduction of virtual modules (at least the way they were implemented) cause |
Beta Was this translation helpful? Give feedback.
-
@KyleAMathews I basically create 65k pages through "createPage" method inside gatsby-node.js file. If I pass a matchPath to at least one of those pages during their creation, my app.js balloons up to 8MB from 300kb because it generates the matchPath array that contains every single page and places it inside the app-hash.js. In general I have only 8 matchPaths set - for some specific pages that use reach/router for dynamic views. |
Beta Was this translation helpful? Give feedback.
-
Wait... the array includes every page not just the few matchPath records? What does the data look like? |
Beta Was this translation helpful? Give feedback.
-
@KyleAMathews Yes, every page is in there. The array looks like this:
and it goes on to include every single page. So, basically each "path" gets assigned its own "matchPath" with the same value as "path". The few matchPaths that were set manually are in there too of course. |
Beta Was this translation helpful? Give feedback.
-
I see this too. For added context, I see this when using languages.forEach(function (language) {
var localePage = generatePage(true, language);
var regexp = new RegExp("/404/?$");
if (regexp.test(localePage.path)) {
localePage.matchPath = "/" + language + "/*";
}
createPage(localePage); This is adding matchPath to 404 pages, but results in gatsby adding every single route to This creates a single page for each language: |
Beta Was this translation helpful? Give feedback.
-
looks like it's intended:
Is this really intended or a bug? |
Beta Was this translation helpful? Give feedback.
-
Any updates on this? our bundle contains 2M of slugs because we have 404 page in multiple locales, and this is quite bad for web-vitals. If we leave out the matchPath the 404s still work except it will reset the url to e.g /da-dk/404. The source of this problem is matchPath when creating the 404s: Basicly: Is there not a less expensive way to not reset the urls on 404?
|
Beta Was this translation helpful? Give feedback.
-
here is a small reproduction of the use case for localised 404 pages: 5 locales, 1 When I build 10000 pages per locale the resulting match-paths.json takes 90% of the bundle, so the more pages and locales we have the bigger the bundle will be code: https://github.com/kdichev/gatsby-match-paths-issue PS. If I build the project to output 100k pages the cold builds are slower with no matchPath: 82s |
Beta Was this translation helpful? Give feedback.
-
I can confirm that adding a single page with matchPath (for example a 404 page with match of |
Beta Was this translation helpful? Give feedback.
-
If anyone faces would like to brainstorm some possible fixes for this & work towards a solution, grab some time to meet — https://calendly.com/kyle-gatsby/30min?back=1&month=2021-05 |
Beta Was this translation helpful? Give feedback.
-
I'm having the same problem. I'm creating an ecommerce website, where we pre-render some products and leave others out. Our folder structure looks something like:
The problem I'm having is that the list of all products are being added to I personally don't mind receiving a 404 for |
Beta Was this translation helpful? Give feedback.
-
For those who are still having this problem, I created a plugin addressing this problem and other performance improvements. The idea is to use the server for routing. It createRedirect for the page-data.json so we don't need to ship all of these paths to the frontend. You can check this out in here: https://github.com/vtex/faststore/tree/master/packages/gatsby-plugin-performance. |
Beta Was this translation helpful? Give feedback.
-
This plugin has a conflict with 'gatsby-plugin-meta-redirect'. "gatsby-plugin-meta-redirect" threw an error while running the onPostBuild lifecycle: ENOTDIR: not a directory, open '/public/page-data/en/nz/page-data.json/index.html' |
Beta Was this translation helpful? Give feedback.
-
I'd love to see this fixed, in our case we have thousands of pages, since our matchPath is in our localized 404 pages (/en-us/404/) then gatsby needs the full list of pages in the array. Which would be acceptable if it was added only to the 404 pages, but since this is embedded in app.js this creates a new version of the app.js script and causes gatsby to re-build every page whenever there's a new page... perhaps matchPath can be extracted to a separate file? |
Beta Was this translation helpful? Give feedback.
-
does anyone know if there was ever any outcome to this? (experiencing same issue) |
Beta Was this translation helpful? Give feedback.
-
i've come up with a dirty workaround for dealing with this issue (only tested with localized 404 pages but in theory would work with any page). use with caution:
<Helmet>
{/*
stores the window.pagePath value in a different global variable and tricks gatsby into disabling client side routing by making this check never true:
https://github.com/gatsbyjs/gatsby/blob/6d8f0bfdcc1a2fe2d07f03583b19ea7934b0ccf4/packages/gatsby/cache-dir/production-app.js#L144-L153
*/}
<script>{`
Object.defineProperty(window, 'pagePath', {
get: function() { return undefined; },
set: function(path) { window.pagePathOverride = path; }
});
`}</script>
</Helmet>
export const onClientEntry = () => {
const { ___loader: loader, pagePathOverride } = window;
if (!loader) {
return;
}
if (pagePathOverride) {
const originalLoadPage = loader.loadPage;
const originalLoadPageSync = loader.loadPageSync;
// here we send window.pagePathOverride to loadPage / loadPageSync instead of window.location.pathname, which lets us override the page resources to use
// and sets matchPath so the page hydrates correctly (https://github.com/gatsbyjs/gatsby/blob/6d8f0bfdcc1a2fe2d07f03583b19ea7934b0ccf4/packages/gatsby/cache-dir/production-app.js#L109-L117)
loader.loadPage = async () => {
const pageResources = await originalLoadPage(pagePathOverride);
pageResources.page.matchPath = '*';
return pageResources;
};
loader.loadPageSync = () => {
const pageResources = originalLoadPageSync(pagePathOverride);
pageResources.page.matchPath = '*';
return pageResources;
};
}
}; |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
We have recently noticed our app.js bundle has shot up in size, and after closer inspection it looks like the majority of the code is data relating to matchPaths.
Relevant information
matchPath
s.createPage
.matchPath
array.I think I may have tracked down the code that does this to
getMatchPaths
ingatsby/src/bootstrap/requires-writer.js
. When I debug,matchPathPages
has a length of 25237, and then I think this matchPath data is written tomatch-paths.json
. Would this have an effect on the app.js bundle size?Environment (if relevant)
Beta Was this translation helpful? Give feedback.
All reactions