Skip to content

Commit 168e940

Browse files
authored
robots.txt
1 parent e539cfd commit 168e940

File tree

1 file changed

+105
-0
lines changed

1 file changed

+105
-0
lines changed

Files/robots.txt

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# 1. A robot may not injure a web server or, through inaction, allow a web
2+
# server to come to harm.
3+
# 2. A robot must obey orders given to it by web servers except where such
4+
# orders would conflict with the First Law.
5+
# 3. A robot must protect its own existence as long as such protection does
6+
# not conflict with the First or Second Law.
7+
#
8+
# This site respects ethical indexing.
9+
# All others, please flap elsewhere.
10+
11+
# Allowed
12+
# Archive & Preservation Bots
13+
User-agent: ia_archiver
14+
User-agent: archive.org_bot
15+
User-agent: Applebot-Extended
16+
Disallow: /.well-known/security.txt
17+
Disallow:
18+
19+
# Search Engine Indexers
20+
User-agent: Bingbot
21+
User-agent: Gigablast
22+
User-agent: Googlebot
23+
User-agent: MojeekBot
24+
User-agent: Qwantify
25+
User-agent: Qwantify/Bleriot
26+
User-agent: SeznamBot
27+
User-agent: Sogou Spider
28+
User-agent: SwisscowsBot
29+
User-agent: Yandex
30+
Disallow:
31+
Disallow: /.well-known/security.txt
32+
33+
# Banned
34+
# AI Crawlers & LLM Indexers
35+
User-agent: ChatGPT-User
36+
User-agent: Claude-SearchBot
37+
User-agent: ClaudeBot
38+
User-agent: GPTBot
39+
User-agent: OAI-SearchBot
40+
User-agent: PerplexityBot
41+
Disallow: /
42+
43+
# SEO & Marketing Scrapers
44+
User-agent: AwarioRssBot
45+
User-agent: AwarioSmartBot
46+
User-agent: BLEXBot
47+
User-agent: CheckMarkNetwork
48+
User-agent: DAUM
49+
User-agent: DataForSeoBot
50+
User-agent: SEOkicks
51+
User-agent: SenutoBot
52+
User-agent: serpstatbot
53+
User-agent: SiteScoreBot
54+
User-agent: SMTBot
55+
Disallow: /
56+
57+
# Educational & Research Crawlers
58+
User-agent: MaCoCu
59+
User-agent: MixnodeCache
60+
User-agent: MyEducationalCrawler
61+
User-agent: panscient.com
62+
User-agent: TurnitinBot
63+
Disallow: /
64+
65+
# General Crawlers & Indexers
66+
User-agent: BUbiNG
67+
User-agent: CCBot
68+
User-agent: dotbot
69+
User-agent: FriendlyCrawler
70+
User-agent: Friendly_Crawler
71+
User-agent: FemtosearchBot
72+
User-agent: IonCrawl
73+
User-agent: MJ12bot
74+
User-agent: Neevabot
75+
User-agent: proximic
76+
User-agent: Thinkbot
77+
User-agent: Website-info.net
78+
Disallow: /
79+
80+
# Data Analytics & Metrics Crawlers
81+
User-agent: Bytespider
82+
User-agent: dataprovider
83+
User-agent: DnBCrawler-Analytics
84+
User-agent: ltx71
85+
Disallow: /
86+
87+
# Data Aggregators & Scrapers
88+
User-agent: AhrefsBot
89+
User-agent: Cliqzbot
90+
User-agent: GetProxi.es-bot
91+
User-agent: iSec_Bot
92+
User-agent: MauiBot
93+
User-agent: NTENTbot
94+
User-agent: omgili
95+
User-agent: omgilibot
96+
User-agent: Scrapy
97+
User-agent: sentibot
98+
User-agent: trendictionbot
99+
User-agent: yacybot
100+
Disallow: /
101+
102+
# Web Archival Crawlers
103+
User-agent: Arquivo-web-crawler
104+
User-agent: special_archiver
105+
Disallow: /

0 commit comments

Comments
 (0)