Why do Scraper User Agents contain "Motzilla"? #1343
Replies: 2 comments 1 reply
-
|
Every browser contains it out of magical thinking regarding compatibility. As a result, things that do not contain Mozilla in them can look more suspicious to administrators. This is a load bearing hack. I'm working on better detection logic but I am one person in a basement in Canada with a full time job unrelated to Anubis. |
Beta Was this translation helpful? Give feedback.
-
|
Going to end up showing my age on this one... Mozilla, the term itself, is a UA prefix that is fairly old. It was, more or less created in the mid-90s as a way to differentiate from Mosaic. The Mozilla UA prefix has persisted to this day, and as far as I am aware, is more or less a legacy tag that has been retained over the years. The Wikipedia page for the Mozilla mascot has a bit of history covering Mozilla as a name as well. Fast forward to today, while Mozilla, and the rest of the user-agent identifier is still used, it is easily spoofable by bad actors. AI/LLM scrapers are of no exception to that. There is expansive proof that these scrapers abuse it in various methods to evade detection, thus 'appearing as a legitimate user' who also will have the Mozilla prefix to the UA. It is a classic "why get rejected at the front door when the back door is wide open?" scenario. As in, "why should I identify as "Some Annoying AI Scraper v1337" and get blocked instantly by the server when i can just use "Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/143.0.0.0 Mobile Safari/537.36" and mosey on in without any issues!? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
First I want to say great tool - as someone who has to pay to host their own website I commend anything to reduce the burden that ai scrappers put on the server.
In the documentation it says
The second 2 of the 3 conditions I understand however the first I don't get and I wasn't able to find a satisfactory answer by googling. I found that all browsers regularly include it for legacy reasons but is there any reason that an AI scrapper needs to? Surely as per the last paragraph it could just pretend to be one of these "low harm clients"?
Thank you in advance :)
Beta Was this translation helpful? Give feedback.
All reactions