Set a user agent string that matches convention used by libraries/tools#300
Set a user agent string that matches convention used by libraries/tools#300ephphatha wants to merge 1 commit intorom1504:mainfrom
Conversation
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent#library_and_net_tool_ua_strings provides a few examples, also see urllib which uses "Python-urllib/<version>". img2dataset does not parse HTML so has no reason to pass a user-agent that indicates mozilla compatibility.
| key, url = row | ||
| img_stream = None | ||
| user_agent_string = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0" | ||
| user_agent_string = "img2dataset/1.x (" |
There was a problem hiding this comment.
Any reason not into use {user_agent_token} rather than hard coding img2dataset here?
There was a problem hiding this comment.
The reference to the repository was hardcoded previously if any user-agent was specified, so it seemed appropriate to use it as the base tool name with the user-provided string added in the comment section.
edit: actually double-checking main() it looks like the default useragent token is None, not "img2dataset" as I thought for some reason. The old default UA does not identify the tool at all.
default UA: img2dataset/1.x (+https://github.com/rom1504/img2dataset)
user-provided UA: img2dataset/1.x (compatible; <user-provided>; +https://github.com/rom1504/img2dataset)
previous strings were:
default UA: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0
user-provided UA: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0 (compatible; <user-provided>; +https://github.com/rom1504/img2dataset)
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent#library_and_net_tool_ua_strings provides a few examples, also see urllib which uses "Python-urllib/".
img2dataset does not parse HTML so has no reason to pass a user-agent that indicates mozilla compatibility.