Skip to content

Commit 4dc3410

Browse files
authored
Fix: Disambiguate Mediapartners-Google user agent (#82536)
`Mediapartners-Google` used to match both `HEADLESS_BROWSER_BOT_UA_RE` and `HTML_LIMITED_BOT_UA_RE`, causing inconsistent state values for `getBotType`, `isDomBotUA`, and `isHtmlLimitedBotUA`.
1 parent 0ad1311 commit 4dc3410

File tree

3 files changed

+8
-5
lines changed

3 files changed

+8
-5
lines changed
Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
// This regex contains the bots that we need to do a blocking render for and can't safely stream the response
22
// due to how they parse the DOM. For example, they might explicitly check for metadata in the `head` tag, so we can't stream metadata tags after the `head` was sent.
3+
// Note: The pattern [\w-]+-Google captures all Google crawlers with "-Google" suffix (e.g., Mediapartners-Google, AdsBot-Google, Storebot-Google)
4+
// as well as crawlers starting with "Google-" (e.g., Google-PageRenderer, Google-InspectionTool)
35
export const HTML_LIMITED_BOT_UA_RE =
4-
/Mediapartners-Google|Chrome-Lighthouse|Slurp|DuckDuckBot|baiduspider|yandex|sogou|bitlybot|tumblr|vkShare|quora link preview|redditbot|ia_archiver|Bingbot|BingPreview|applebot|facebookexternalhit|facebookcatalog|Twitterbot|LinkedInBot|Slackbot|Discordbot|WhatsApp|SkypeUriPreview|Yeti/i
6+
/[\w-]+-Google|Google-[\w-]+|Chrome-Lighthouse|Slurp|DuckDuckBot|baiduspider|yandex|sogou|bitlybot|tumblr|vkShare|quora link preview|redditbot|ia_archiver|Bingbot|BingPreview|applebot|facebookexternalhit|facebookcatalog|Twitterbot|LinkedInBot|Slackbot|Discordbot|WhatsApp|SkypeUriPreview|Yeti|googleweblight/i

packages/next/src/shared/lib/router/utils/is-bot.ts

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
import { HTML_LIMITED_BOT_UA_RE } from './html-bots'
22

33
// Bot crawler that will spin up a headless browser and execute JS.
4-
// By default, only googlebots are considered as DOM bots. Blow is where the regex is computed from:
4+
// Only the main Googlebot search crawler executes JavaScript, not other Google crawlers.
55
// x-ref: https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers
6-
const HEADLESS_BROWSER_BOT_UA_RE = /google/i
6+
// This regex specifically matches "Googlebot" but NOT "Mediapartners-Google", "AdsBot-Google", etc.
7+
const HEADLESS_BROWSER_BOT_UA_RE = /Googlebot(?!-)|Googlebot$/i
78

89
export const HTML_LIMITED_BOT_UA_RE_STRING = HTML_LIMITED_BOT_UA_RE.source
910

test/production/app-dir/metadata-streaming-config/metadata-streaming-config.test.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ describe('app-dir - metadata-streaming-config', () => {
1111
)
1212

1313
expect(requiredServerFiles.config.htmlLimitedBots).toMatchInlineSnapshot(
14-
`"Mediapartners-Google|Chrome-Lighthouse|Slurp|DuckDuckBot|baiduspider|yandex|sogou|bitlybot|tumblr|vkShare|quora link preview|redditbot|ia_archiver|Bingbot|BingPreview|applebot|facebookexternalhit|facebookcatalog|Twitterbot|LinkedInBot|Slackbot|Discordbot|WhatsApp|SkypeUriPreview|Yeti"`
14+
`"[\\w-]+-Google|Google-[\\w-]+|Chrome-Lighthouse|Slurp|DuckDuckBot|baiduspider|yandex|sogou|bitlybot|tumblr|vkShare|quora link preview|redditbot|ia_archiver|Bingbot|BingPreview|applebot|facebookexternalhit|facebookcatalog|Twitterbot|LinkedInBot|Slackbot|Discordbot|WhatsApp|SkypeUriPreview|Yeti|googleweblight"`
1515
)
1616

1717
const prerenderManifest = JSON.parse(
@@ -38,7 +38,7 @@ describe('app-dir - metadata-streaming-config', () => {
3838
"/ppr": {
3939
"key": "user-agent",
4040
"type": "header",
41-
"value": "Mediapartners-Google|Chrome-Lighthouse|Slurp|DuckDuckBot|baiduspider|yandex|sogou|bitlybot|tumblr|vkShare|quora link preview|redditbot|ia_archiver|Bingbot|BingPreview|applebot|facebookexternalhit|facebookcatalog|Twitterbot|LinkedInBot|Slackbot|Discordbot|WhatsApp|SkypeUriPreview|Yeti",
41+
"value": "[\\w-]+-Google|Google-[\\w-]+|Chrome-Lighthouse|Slurp|DuckDuckBot|baiduspider|yandex|sogou|bitlybot|tumblr|vkShare|quora link preview|redditbot|ia_archiver|Bingbot|BingPreview|applebot|facebookexternalhit|facebookcatalog|Twitterbot|LinkedInBot|Slackbot|Discordbot|WhatsApp|SkypeUriPreview|Yeti|googleweblight",
4242
},
4343
}
4444
`)

0 commit comments

Comments
 (0)