-
Notifications
You must be signed in to change notification settings - Fork 113
Open
Description
Wikipedia is very understandably taking measure against scraping bots as the age of the llm slop scraping is upon us. I should probably look at:
- adhering to their terms
- caching fetched images (I thought I was doing this, but we need to be part of the good guys here)
- limiting runs against live urls (the previous point should hopefully help with this)
- confirm whether I should mitigate against hammering other common sources, bgg is probably the next biggest source
Issue as discovered via build/lint:
Example:
159:1-159:163 warning Unexpected dead URL `[https://upload.wikimedia.org/wikipedia/en/thumb/9/92/Ticket_to_Ride_Board_Game_Box_EN.jpg/220px-Ticket_to_Ride_Board_Game_Box_EN.jpg`](https://upload.wikimedia.org/wikipedia/en/thumb/9/92/Ticket_to_Ride_Board_Game_Box_EN.jpg/220px-Ticket_to_Ride_Board_Game_Box_EN.jpg%60), expected live URL no-dead-urls remark-lint
[cause]:
error Unexpected not ok response `429` (`Use thumbnail steps listed on https://w.wiki/GHai. Please contact noc@wikimedia.org for further information (a765913)`) on `[https://upload.wikimedia.org/wikipedia/en/thumb/9/92/Ticket_to_Ride_Board_Game_Box_EN.jpg/220px-Ticket_to_Ride_Board_Game_Box_EN.jpg`](https://upload.wikimedia.org/wikipedia/en/thumb/9/92/Ticket_to_Ride_Board_Game_Box_EN.jpg/220px-Ticket_to_Ride_Board_Game_Box_EN.jpg%60) dead dead-or-alive
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
