Skip to content

Ensure that mybinder.org isn't being significantly impacted by bot and crawlers #3557

@choldgraf

Description

@choldgraf

I'm concerned that we have a significant amount of bot traffic creating sessions on mybinder.org. We should confirm that this isn't the case, or fix this issue, if others are worried about this too.

Context

The 2i2c team (thanks @jmunroe!) recently discovered that we may have caught a significant amount of bot and scraper activity leading to launches on a public BinderHub that we run. Binder sessions were being spun up, followed by a period of no activity, before they were shut down. We think it's something like an LLM scraper or a bot that is hitting URLs and causing sessions to spin up.

Why I think we might have a lot of bots spawning Binder sessions

We use plausible for web analytics, and I am pretty sure plausible filters out any known bot activity1. So we can compare the logs of plausible against the logs of mybinder.org's analytics archive to get an idea of "launches that plausible filtered out".

For example for December 9th:

  • Visits to website logged by Plausible: 3,100
  • Mybinder launch events: 5,184

December 5th:

  • Visits to mybinder.org: 2,500
  • Mybinder launch events: 4,398

In both cases it seems like something around 40% of mybinder.org launch events are not being logged by plausible.

Some % of them might be "back-end" launches where nobody touches the browser, but it's plausible that a high percentage of those are bot accounts that are triggering Binder events, but not being logged in Plausible.

Footnotes

  1. Plausible is a privacy-first service so we don't have access to the actual user agent names, HTTP requests, etc. That's why we have to kinda infer an outcome here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions