-
Notifications
You must be signed in to change notification settings - Fork 301
feat(robots.txt): add guide for extending robots.txt functionality #1988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
1f3bf6b
f6d8a8e
9ea6ff2
f559a4d
64a7e42
daa5d3a
b0003bb
60d3f37
6540bf7
9096121
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,204 @@ | ||
| --- | ||
| nav: | ||
| title: Extend robots.txt | ||
| position: 20 | ||
|
|
||
| --- | ||
|
|
||
| # Extend robots.txt | ||
|
|
||
| ## Overview | ||
|
|
||
| Since Shopware 6.7.1, the platform provides full robots.txt support with all standard directives and user-agent blocks. This feature was developed during [Hacktoberfest 2024](https://www.shopware.com/en/news/hacktoberfest-2024-outcome-a-robots-txt-for-shopware/). Learn more about [configuring robots.txt](https://docs.shopware.com/en/shopware-6-en/tutorials-and-faq/creation-of-robots-txt) in the user documentation. | ||
|
|
||
| Starting with Shopware 6.7.4, you can extend the robots.txt functionality through events to: | ||
|
|
||
| * Add custom validation rules during parsing | ||
| * Modify or generate directives dynamically | ||
| * Support custom or vendor-specific directives | ||
| * Prevent warnings for known non-standard directives | ||
|
|
||
| ::: info | ||
| The event system described in this guide requires Shopware 6.7.4 or later. | ||
BrocksiNet marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ::: | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| This guide requires you to have a basic plugin running. If you don't know how to create a plugin, head over to the plugin base guide: | ||
|
|
||
| <PageRef page="../../plugin-base-guide" /> | ||
|
Check warning on line 29 in guides/plugins/plugins/content/seo/extend-robots-txt.md
|
||
|
|
||
| You should also be familiar with [Event subscribers](../../plugin-fundamentals/listening-to-events). | ||
|
|
||
| ## Modifying parsed directives | ||
|
|
||
| The `RobotsDirectiveParsingEvent` is dispatched after robots.txt content is parsed. You can modify the parsed result, add validation, or inject dynamic directives. | ||
|
|
||
| This example shows how to add AI crawler restrictions and validate crawl-delay values: | ||
|
|
||
| <Tabs> | ||
| <Tab title="RobotsExtensionSubscriber.php"> | ||
|
|
||
| ```php | ||
|
Check warning on line 42 in guides/plugins/plugins/content/seo/extend-robots-txt.md
|
||
| <?php declare(strict_types=1); | ||
|
|
||
| namespace Swag\Example\Subscriber; | ||
|
|
||
| use Psr\Log\LoggerInterface; | ||
| use Shopware\Core\Framework\Log\Package; | ||
| use Shopware\Storefront\Page\Robots\Event\RobotsDirectiveParsingEvent; | ||
| use Shopware\Storefront\Page\Robots\Parser\ParseIssue; | ||
| use Shopware\Storefront\Page\Robots\Parser\ParseIssueSeverity; | ||
| use Shopware\Storefront\Page\Robots\ValueObject\RobotsDirective; | ||
| use Shopware\Storefront\Page\Robots\ValueObject\RobotsDirectiveType; | ||
| use Shopware\Storefront\Page\Robots\ValueObject\RobotsUserAgentBlock; | ||
| use Symfony\Component\EventDispatcher\EventSubscriberInterface; | ||
|
|
||
| #[Package('storefront')] | ||
| class RobotsExtensionSubscriber implements EventSubscriberInterface | ||
| { | ||
| public function __construct( | ||
| private readonly LoggerInterface $logger, | ||
| ) { | ||
| } | ||
|
|
||
| public static function getSubscribedEvents(): array | ||
BrocksiNet marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| { | ||
| return [ | ||
| RobotsDirectiveParsingEvent::class => 'onRobotsParsing', | ||
| ]; | ||
| } | ||
|
|
||
| public function onRobotsParsing(RobotsDirectiveParsingEvent $event): void | ||
| { | ||
| $parsedRobots = $event->getParsedRobots(); | ||
|
|
||
| // 1. Add restrictions for AI crawlers | ||
| $aiCrawlers = ['GPTBot', 'ChatGPT-User', 'CCBot', 'anthropic-ai']; | ||
|
|
||
| $aiBlock = new RobotsUserAgentBlock( | ||
| userAgents: $aiCrawlers, | ||
| directives: [ | ||
| new RobotsDirective( | ||
| type: RobotsDirectiveType::DISALLOW, | ||
| value: '/checkout/', | ||
| ), | ||
| ], | ||
| ); | ||
|
|
||
| $parsedRobots->addUserAgentBlock($aiBlock); | ||
|
|
||
| // 2. Validate existing crawl-delay values | ||
BrocksiNet marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| foreach ($parsedRobots->getUserAgentBlocks() as $block) { | ||
| foreach ($block->getDirectives() as $directive) { | ||
| if ($directive->getType() === RobotsDirectiveType::CRAWL_DELAY) { | ||
| $value = (int) $directive->getValue(); | ||
|
|
||
| if ($value > 60) { | ||
| $event->addIssue(new ParseIssue( | ||
| severity: ParseIssueSeverity::WARNING, | ||
| message: sprintf( | ||
| 'Crawl-delay of %d seconds may be too high', | ||
| $value | ||
| ), | ||
| lineNumber: null, | ||
| )); | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| $this->logger->info('Extended robots.txt with AI crawler rules'); | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| </Tab> | ||
|
|
||
| <Tab title="services.xml"> | ||
|
|
||
| ```xml | ||
|
Check warning on line 120 in guides/plugins/plugins/content/seo/extend-robots-txt.md
|
||
| <?xml version="1.0" ?> | ||
|
Check warning on line 121 in guides/plugins/plugins/content/seo/extend-robots-txt.md
|
||
| <container xmlns="http://symfony.com/schema/dic/services" | ||
| xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
| xsi:schemaLocation="http://symfony.com/schema/dic/services http://symfony.com/schema/dic/services/services-1.0.xsd"> | ||
|
|
||
| <services> | ||
| <service id="Swag\Example\Subscriber\RobotsExtensionSubscriber"> | ||
| <argument type="service" id="logger"/> | ||
| <tag name="kernel.event_subscriber"/> | ||
| </service> | ||
| </services> | ||
| </container> | ||
| ``` | ||
|
|
||
| </Tab> | ||
| </Tabs> | ||
|
|
||
| ## Handling custom directives | ||
|
|
||
| The `RobotsUnknownDirectiveEvent` is dispatched when an unknown directive is encountered. Use this to support vendor-specific directives or prevent warnings for known non-standard directives: | ||
|
|
||
| ```php | ||
|
Check warning on line 142 in guides/plugins/plugins/content/seo/extend-robots-txt.md
|
||
| <?php declare(strict_types=1); | ||
|
|
||
| namespace Swag\Example\Subscriber; | ||
|
|
||
| use Shopware\Core\Framework\Log\Package; | ||
| use Shopware\Storefront\Page\Robots\Event\RobotsUnknownDirectiveEvent; | ||
| use Symfony\Component\EventDispatcher\EventSubscriberInterface; | ||
|
|
||
| #[Package('storefront')] | ||
| class CustomDirectiveSubscriber implements EventSubscriberInterface | ||
| { | ||
| public static function getSubscribedEvents(): array | ||
| { | ||
| return [ | ||
| RobotsUnknownDirectiveEvent::class => 'handleCustomDirective', | ||
| ]; | ||
| } | ||
|
|
||
| public function handleCustomDirective(RobotsUnknownDirectiveEvent $event): void | ||
| { | ||
| // Support Google and Yandex specific directives | ||
| $knownCustomDirectives = ['noimageindex', 'noarchive', 'clean-param']; | ||
|
|
||
| if (in_array(strtolower($event->getDirectiveName()), $knownCustomDirectives, true)) { | ||
| $event->setHandled(true); // Prevent "unknown directive" warning | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| Register the subscriber in your `services.xml`: | ||
|
|
||
| ```xml | ||
|
Check warning on line 175 in guides/plugins/plugins/content/seo/extend-robots-txt.md
|
||
| <service id="Swag\Example\Subscriber\CustomDirectiveSubscriber"> | ||
| <tag name="kernel.event_subscriber"/> | ||
| </service> | ||
| ``` | ||
|
|
||
| ## Parse issues | ||
|
|
||
| You can add validation warnings or errors during parsing using the `ParseIssue` class: | ||
|
|
||
| ```php | ||
|
Check warning on line 185 in guides/plugins/plugins/content/seo/extend-robots-txt.md
|
||
| use Shopware\Storefront\Page\Robots\Parser\ParseIssue; | ||
| use Shopware\Storefront\Page\Robots\Parser\ParseIssueSeverity; | ||
|
|
||
| // Add a warning | ||
| $event->addIssue(new ParseIssue( | ||
BrocksiNet marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| severity: ParseIssueSeverity::WARNING, | ||
| message: 'Consider adding a sitemap directive for better SEO', | ||
| lineNumber: null, | ||
| )); | ||
|
|
||
| // Add an error | ||
| $event->addIssue(new ParseIssue( | ||
| severity: ParseIssueSeverity::ERROR, | ||
| message: 'Invalid crawl-delay value: must be a positive integer', | ||
| lineNumber: 42, | ||
| )); | ||
| ``` | ||
|
|
||
| Issues are automatically logged when the robots.txt configuration is saved in the Administration. Use `WARNING` for recommendations and `ERROR` for critical problems that prevent proper generation. | ||
Uh oh!
There was an error while loading. Please reload this page.