Skip to content

Commit fa1ff82

Browse files
authored
Merge pull request #116 from wpengine/docs-explanation-sitemaps
docs: explanation about sitemaps in headless WordPress
2 parents 56ebe4f + a54e0a1 commit fa1ff82

File tree

2 files changed

+258
-0
lines changed

2 files changed

+258
-0
lines changed

docs/explanation/sitemaps-1.png

186 KB
Loading

docs/explanation/sitemaps.md

Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
# Sitemaps in WordPress: A Comprehensive Overview
2+
3+
## What is a Sitemap?
4+
A sitemap is an XML file that provides a structured list of pages on a website by helping search engines discover and crawl content more efficiently. It acts as a roadmap of your website's structure, containing important metadata about each page.
5+
Since WordPress 5.5, there's a built-in XML sitemap generator that:
6+
7+
* Automatically creates sitemaps for posts, pages, and custom post types
8+
* Dynamically updates as content is added, modified, or deleted
9+
* Provides basic SEO and indexing support out of the box
10+
11+
However, this default sitemap lacks customization options, which is why many users opt for plugins like Yoast SEO or Jetpack to generate more comprehensive sitemaps.
12+
13+
You can view the current sitemap generated by WordPress by visiting yourdomain.com/sitemap.xml in your browser.
14+
![Image showing sitemap for a specific website](./sitemaps-1.png)
15+
16+
## Sitemaps in Headless WordPress
17+
Headless WordPress environments introduce unique challenges for sitemap generation:
18+
19+
* **Separation of content management and frontend rendering**: In traditional WordPress setups, content management and frontend rendering are integrated. However, in headless environments, WordPress acts as a backend CMS, while the frontend is handled by frameworks like Next.js. This separation requires custom solutions for sitemap generation.
20+
21+
* **Dynamic routes**: Dynamic routes in frameworks like Next.js may not be directly accessible or easily discoverable by search engines.
22+
23+
* **Need for custom sitemap generation and management**: Due to the decoupling of backend and frontend, traditional WordPress sitemap plugins might not work seamlessly. Therefore, custom approaches are needed to generate and manage sitemaps effectively.
24+
25+
To address those, there are some proposed solutions for headless sitemap implementation:
26+
27+
1. **Proxying Sitemaps from Backend to Frontend**
28+
29+
Approach: This approach maintains WordPress's native sitemap generation capabilities while ensuring proper frontend URLs are used. It involves creating API routes in your Next.js application that proxy requests to the WordPress backend sitemap.
30+
31+
Example Code:
32+
33+
```javascript
34+
// /pages/api/proxy-sitemap/[...slug].js
35+
const wordpressUrl = (
36+
process.env.NEXT_PUBLIC_WORDPRESS_URL || "http://localhost:8888"
37+
).trim();
38+
39+
export default async function handler(req, res) {
40+
const slug = req.query.slug || [];
41+
42+
// Reconstruct the original WordPress sitemap path
43+
let wpUrl;
44+
if (slug.length === 0 || slug[0] === "sitemap.xml") {
45+
wpUrl = `${wordpressUrl}/sitemap.xml`;
46+
} else {
47+
const wpPath = slug.join("/");
48+
wpUrl = `${wordpressUrl}/${wpPath}.xml`;
49+
}
50+
51+
console.debug("Fetching sitemap", wpUrl);
52+
try {
53+
const wpRes = await fetch(wpUrl);
54+
console.debug("Fetching sitemap", wpRes);
55+
if (!wpRes.ok) {
56+
return res.status(wpRes.status).send("Error fetching original sitemap");
57+
}
58+
59+
const contentType = wpRes.headers.get("content-type") || "application/xml";
60+
let body = await wpRes.text();
61+
// Remove XML stylesheets if present
62+
body = body.replace(/<\?xml-stylesheet[^>]*\?>\s*/g, "");
63+
64+
res.setHeader("Content-Type", contentType);
65+
res.status(200).send(body);
66+
} catch (err) {
67+
res.status(500).send("Internal Proxy Error");
68+
}
69+
}
70+
```
71+
Then add the necessary rewrites in your `next.config.js`:
72+
```javascript
73+
module.exports = {
74+
async rewrites() {
75+
return [
76+
{
77+
source: "/:path(wp-sitemap-.*).xml",
78+
destination: "/api/proxy-sitemap/:path",
79+
},
80+
{
81+
source: "/sitemap.xml",
82+
destination: "/api/proxy-sitemap/sitemap.xml",
83+
},
84+
];
85+
},
86+
// other Next.js configuration
87+
};
88+
```
89+
To ensure that the sitemap URLs in your headless WordPress setup correctly point to your frontend application, it's essential to configure the WordPress Site Address (URL) setting to match your frontend's URL. This is done in the WordPress settings page.
90+
91+
**Note**: The WordPress Address (URL) should remain set to the URL where your WordPress backend is hosted. Only the Site Address (URL) needs to be updated to reflect your frontend's URL.
92+
93+
This implementation ensures that when visitors access `/sitemap.xml` on your headless frontend, they'll see the WordPress sitemap content.
94+
95+
This route will serve the WordPress `sitemap.xml` in your Next.js application dynamically.
96+
97+
- **Pros**
98+
* Leverages WordPress's built-in sitemap generation capabilities
99+
* Works seamlessly with SEO plugins like Yoast SEO or All in One SEO
100+
* Simple implementation that requires minimal code
101+
* Automatically updates when content changes in WordPress
102+
103+
- **Cons**
104+
* Limited flexibility for custom frontend routes not defined in WordPress
105+
* Requires proper URL transformation to replace backend URLs with frontend URLs
106+
* May require additional handling for caching and performance
107+
* May propagate any errors experienced in WordPress when proxying the `sitemap.xml`
108+
109+
2. **Generating a Sitemap from GraphQL Content**
110+
111+
Approach: This approach involves fetching all available post and page info via GraphQL and generating a custom sitemap. This can be implemented using either server-side rendering (SSR) or static generation strategies. However since, WPGraphQL returns maximum 100 node per page trying to fetch all available post or pages on a large site might be problematic and slow. See [pagination limits in wp-graphql](https://www.wpgraphql.com/docs/known-limitations#pagination-limits).
112+
113+
Example implementation using Next.js and WPGraphQL:
114+
115+
```javascript
116+
import { gql } from '@apollo/client';
117+
import { client } from '../lib/apolloClient';
118+
119+
// Function to fetch all posts from WordPress
120+
async function fetchAllPosts() {}
121+
122+
// Similar function for pages
123+
async function fetchAllPages() {}
124+
125+
export async function generateSitemap() {
126+
const [posts, pages] = await Promise.all([
127+
fetchAllPosts(),
128+
fetchAllPages(),
129+
]);
130+
const allContent = [
131+
...data.posts.nodes.map(post => ({
132+
slug: `posts/${post.slug}`,
133+
modified: post.modified
134+
})),
135+
...data.pages.nodes.map(page => ({
136+
slug: page.slug,
137+
modified: page.modified
138+
})),
139+
// Add custom frontend routes here
140+
{ slug: '', modified: new Date().toISOString() }, // Homepage
141+
{ slug: 'about-us', modified: new Date().toISOString() },
142+
];
143+
const sitemap = `<?xml version="1.0" encoding="UTF-8"?>
144+
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
145+
${allContent.map(item => `
146+
<url>
147+
<loc>${process.env.FRONTEND_URL}/${item.slug}</loc>
148+
<lastmod>${item.modified}</lastmod>
149+
</url>
150+
`).join('')}
151+
</urlset>
152+
`;
153+
return sitemap;
154+
}
155+
export async function getServerSideProps({ res }) {
156+
const sitemap = await generateSitemap();
157+
158+
res.setHeader('Content-Type', 'application/xml');
159+
res.write(sitemap);
160+
res.end();
161+
162+
return { props: {} };
163+
}
164+
165+
export async function GET() {
166+
const sitemap = await generateSitemap();
167+
168+
return new Response(sitemap, {
169+
headers: {
170+
'Content-Type': 'application/xml',
171+
},
172+
});
173+
}
174+
```
175+
- **Pros**
176+
* Complete control over sitemap structure and content
177+
* Ability to include custom frontend routes not defined in WordPress
178+
* Easy integration with Next.js data fetching methods
179+
180+
- **Cons**:
181+
* More complex implementation than proxying
182+
* Requires manual updates to include new content types or custom routes
183+
* May require pagination handling for large sites
184+
* Doesn't leverage WordPress SEO plugin sitemap enhancements
185+
* Increasing the GraphQL limits may cause performance issues on resource-constrained WordPress instances.
186+
187+
3. **Hybrid Approach: Fetching, Parsing, and Enhancing Existing Sitemaps**
188+
189+
Approach: This approach (currently used by Faust) fetches the existing WordPress sitemaps, parses them, and enhances them with additional frontend routes. This provides the benefits of WordPress's sitemap generation while allowing for customization.
190+
191+
```javascript
192+
import { DOMParser } from 'xmldom';
193+
194+
export async function GET() {
195+
const response = await fetch(`${process.env.WORDPRESS_URL}/wp-sitemap.xml`);
196+
const sitemapIndex = await response.text();
197+
198+
const parser = new DOMParser();
199+
const xmlDoc = parser.parseFromString(sitemapIndex, 'text/xml');
200+
const sitemapUrls = Array.from(xmlDoc.getElementsByTagName('loc')).map(
201+
node => node.textContent
202+
);
203+
204+
const processedSitemaps = await Promise.all(
205+
sitemapUrls.map(async (url) => {
206+
const sitemapResponse = await fetch(url);
207+
const sitemapContent = await sitemapResponse.text();
208+
209+
return sitemapContent.replace(
210+
new RegExp(process.env.WORDPRESS_URL, 'g'),
211+
process.env.FRONTEND_URL
212+
);
213+
})
214+
);
215+
216+
const frontendRoutesSitemap = `
217+
<?xml version="1.0" encoding="UTF-8"?>
218+
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
219+
<url>
220+
<loc>${process.env.FRONTEND_URL}/custom-route</loc>
221+
<lastmod>${new Date().toISOString()}</lastmod>
222+
</url>
223+
<!-- Add more custom routes as needed -->
224+
</urlset>
225+
`;
226+
227+
const combinedSitemap = `
228+
<?xml version="1.0" encoding="UTF-8"?>
229+
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
230+
${processedSitemaps.map((_, index) => `
231+
<sitemap>
232+
<loc>${process.env.FRONTEND_URL}/sitemaps/wp-sitemap-${index}.xml</loc>
233+
</sitemap>
234+
`).join('')}
235+
<sitemap>
236+
<loc>${process.env.FRONTEND_URL}/sitemaps/frontend-routes.xml</loc>
237+
</sitemap>
238+
</sitemapindex>
239+
`;
240+
241+
return new Response(combinedSitemap, {
242+
headers: {
243+
'Content-Type': 'application/xml',
244+
},
245+
});
246+
}
247+
```
248+
249+
- **Pros**
250+
* Combines the best of both approaches
251+
* Leverages WordPress SEO plugin enhancements
252+
* Allows for custom frontend routes
253+
* Provides flexibility for complex sitemap requirements
254+
255+
- **Cons**
256+
* Most complex implementation of the three approaches.
257+
* Requires handling multiple sitemap files
258+
* May have performance implications if not properly cached

0 commit comments

Comments
 (0)