@@ -131,20 +131,66 @@ like the price and rating of a product:
131
131
robots.txt
132
132
----------
133
133
134
- A robots.txt file tells search engine crawlers which URLs the crawler can access on your site, to
135
- index its content. This is used mainly to avoid overloading your site with requests.
134
+ A ` robots.txt ` file instructs search engine crawlers which parts of a website they are permitted to
135
+ access. Its primary purpose is to:
136
136
137
- When indexing your website, search engines take a first look at the robots.txt file. Odoo
138
- automatically creates one robot.txt file available on `mydatabase.odoo.com/robots.txt `.
137
+ - **Prevent overloading the website: ** By guiding crawlers away from certain sections, robots.txt
138
+ helps manage server load.
139
+ - **Control access to resources and detailed descriptions: ** It can prevent crawlers from accessing
140
+ media files (images, videos), CSS stylesheets, and JavaScript files, and from reading the content
141
+ (text) of specific pages.
142
+
143
+ When indexing your website, search engines first look at the robots.txt file. Odoo automatically
144
+ creates one robot.txt file available on `mydatabase.odoo.com/robots.txt `.
145
+
146
+ .. note ::
147
+ Reputable bots adhere to robots.txt; others may require blocking via
148
+ :ref: `Cloudflare <domain-name/naked/cloudflare >` on your custom domain.
149
+
150
+ Edit robots.txt
151
+ ~~~~~~~~~~~~~~~
139
152
140
153
By editing a robots.txt file, you can control which site pages are accessible to search engine
141
154
crawlers. To add custom instructions to the file, go to :menuselection: `Website --> Configuration
142
155
--> Settings `, scroll down to the :guilabel: `SEO ` section, and click :guilabel: `Edit robots.txt `.
143
156
144
157
.. example ::
145
- If you do not want the robots to crawl the `/about-us ` page of your site, you can edit the
158
+ If you do not want robots to crawl the `/about-us ` page of your site, you can edit the
146
159
robots.txt file to add `Disallow: /about-us `.
147
160
161
+ .. important ::
162
+ While `robots.txt ` prevents content from being crawled, **it does not guarantee that a page
163
+ will not be indexed **. A page can still appear in search results if it is linked to from other
164
+ crawled pages (indexed by "reference"). Google generally does not recommend using robots.txt to
165
+ block webpages that you wish to keep out of search results entirely.
166
+
167
+ Prevent a page from being indexed
168
+ ---------------------------------
169
+
170
+ To effectively prevent a page from appearing in search engine results, use one of the following
171
+ methods:
172
+
173
+ - **noindex tag: ** Access the page's :ref: `properties <website/pages/page_properties >` and toggle
174
+ the :guilabel: `Indexed ` switch off.
175
+
176
+ .. note ::
177
+ This option is not yet available for :ref: `dynamic pages <website/pages/page_type >`.
178
+
179
+ - **404 or 403: ** Configure the page to return a 404 (Not Found) or 403 (Forbidden) HTTP status
180
+ code. These codes signal to search engines that the page does not exist or is inaccessible,
181
+ leading to its eventual removal from the index.
182
+
183
+ - **404: ** :ref: `Configure a 404 redirection. <website/pages/URL-redirection >`
184
+ - **403: ** Access the page's :ref: `properties <website/pages/page_properties >`
185
+ and toggle the :guilabel: `Visibility ` switch off or :ref: `unpublish the page <website/pages/un-publish-page >`.
186
+
187
+ - **Google Search Console: ** Use Google Search Console to request the removal of specific URLs from
188
+ Google's index.
189
+
190
+ .. seealso ::
191
+ - :doc: `../configuration/google_search_console `
192
+ - :doc: `../pages `
193
+
148
194
Sitemap
149
195
-------
150
196
0 commit comments