Unintended consquences of including FacebookExternalHit user agent in Robots.txt #161

steveb510 · 2025-07-11T22:14:17Z

steveb510
Jul 11, 2025

I run a website (The Steampunk Explorer) that provides news coverage and other resources related to steampunk,. I recently added the user agents listed in ai.robots.txt/robots.txt to my own robots.txt file as I don't want AI bots crawling my site,

I am most appreciative of the work done on this project. However, I discovered that adding one of the listed user agents, FacebookExternalHit, had unintended consequences. It does not appear to have any relation to Meta's AI initiatives, and websites that block it will likely find that this compromises their ability to manually share their content on Facebook and other platforms.

When I post articles on the site, the header includes metadata about the article's content, including OpenGraph (OG) metadata used by Facebook and other platforms to identify the title, description, and a representative image. When content is shared to Facebook (and other platforms), the platforms use that OG metadata to determine what to show. It appears that FacebookExternalHit somehow enables this to happen.

Shortly after adding the AI crawlers to robots.txt, I found that three social media platforms -- Facebook, Bluesky, and Mastodon -- were unable to read the OG metadata. I tried checking the URL in the Facebook debugger tool (https://developers.facebook[dot]com/tools/debug/) and it generated an error message stating that inclusion of FacebookExternalHit in robots.txt was preventing it from scraping the article.

When I removed FacebookExternalHit from robots.txt, I found that Facebook, Bluesky, and Mastodon could once again read the OG data.

I'm not sure why inclusion of FacebookExternalHit would affect non-Meta platforms, but that appears to be the case. It seems that Bluesky and Mastodon both rely on OpenGraph and somehow, blocking FacebookExternalHit prevents them from doing so.

Meta's developer documentation (https://developers.facebook[dot]com/docs/sharing/webmasters/web-crawlers/ lists the company's web crawlers and what they do. It appears that some, indeed, are involved in AI training. But if websites want to maximize their exposure on social media, it seems that they should be allowing FacebookExternalHit.

glyn · 2025-07-13T08:06:24Z

glyn
Jul 13, 2025
Maintainer

See #40 and [href="https://github.com/ai-robots-txt/ai.robots.txt/pulls?q=is%3Apr+facebookexternalhit+is%3Aclosed](these closed PRs) (most recently #154) for the history of FacebookExternalHit on this site. The main difficulty is that it appears that Meta is not necessarily being honest in their description of the purposes of FacebookExternalHit.

Please note I couldn't find FacebookExternalHit/facebookexternalhit in the Mastodon source code. If anyone else reading this can reproduce the behaviour with Mastodon, it would be worth asking around on Mastodon to see if anyone has an idea of what's going on.

2 replies

steveb510 Jul 13, 2025
Author

It's possible that Mastodon was just a temporary glitch. Sometimes it fails to show the metadata at first. But the issue definitely seemed to affect Bluesky in addition to Facebook. It's easy enough for webmasters to omit facebookexternalhit from robots.txt, but I thought people should know. I was unaware of the earlier thread.

glyn Jul 14, 2025
Maintainer

It may be worth reproducing the Bluesky issue and raising an issue against Bluesky (some repo in https://github.com/bluesky-social).

Other than that, please close this discussion if you think we are done.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai.robots.txt

Unintended consquences of including FacebookExternalHit user agent in Robots.txt #161

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

ai.robots.txt

Unintended consquences of including FacebookExternalHit user agent in Robots.txt #161

Uh oh!

Uh oh!

steveb510 Jul 11, 2025

Replies: 1 comment · 2 replies

Uh oh!

glyn Jul 13, 2025 Maintainer

Uh oh!

steveb510 Jul 13, 2025 Author

Uh oh!

glyn Jul 14, 2025 Maintainer

steveb510
Jul 11, 2025

Replies: 1 comment 2 replies

glyn
Jul 13, 2025
Maintainer

steveb510 Jul 13, 2025
Author

glyn Jul 14, 2025
Maintainer