update-Tumblr Crawler impl by [tumblr apiv2]#35
update-Tumblr Crawler impl by [tumblr apiv2]#35geosmart wants to merge 1 commit intodixudx:masterfrom
Conversation
dixudx
left a comment
There was a problem hiding this comment.
Please follow the api doc.
Also there is Audio Post. Better support that as well.
Please run PEP8 check before submiting your codes.
| @@ -93,47 +96,3 @@ If you are using Shadowsocks with global mode, your `./proxies.json` can be, | |||
| ``` | |||
|
|
|||
| And now you can enjoy your downloads. | |||
There was a problem hiding this comment.
Why deletes below content. Any reasons?
| THREADS = 10 | ||
|
|
||
| # just a test apikey | ||
| API_KEY = "lmvVU5ExdfFZPyGOv0gCknJ2r1UnQEIZTYAYoDhKrq7eJdCn2o" |
There was a problem hiding this comment.
Why posted test apikey here? Seems this api_key belongs to your account. If you want to make this available for public, that's ok.
We'd better provider a function def getApiKey(username, password) to get this apikey instead of this hard code way if you want to use api v2. And the user inputs username and password through CLI or file.
For some users, they have not registered tumblr and may not take time to get registered. This is why I chose api v1 instead of v2.
Anyway, this will be a good enhancement to provide v2 support.
There was a problem hiding this comment.
the key I found it by google,it's not mine,just use it
| API_KEY = "lmvVU5ExdfFZPyGOv0gCknJ2r1UnQEIZTYAYoDhKrq7eJdCn2o" | ||
|
|
||
| # enum(posts,likes) | ||
| POST_TYPE="likes" |
There was a problem hiding this comment.
Better add some doc here.
For most common usage, POST_TYPE = ("likes", "posts")
| def download_media(self, site): | ||
| self.download_photos(site) | ||
| self.download_videos(site) | ||
| # self.download_videos(site) |
| os.mkdir(target_folder) | ||
|
|
||
| base_url = "http://{0}.tumblr.com/api/read?type={1}&num={2}&start={3}" | ||
| # liked posts: |
There was a problem hiding this comment.
Should move to two functions, such as download_likes() and download_posts(). And for each, there will be photos and videos.
| print(u"文件proxies.json格式非法.\n" | ||
| u"请参照示例文件'proxies_sample1.json'和'proxies_sample2.json'.\n" | ||
| u"然后去 http://jsonlint.com/ 进行验证.") | ||
| print(u"proxies.json format illegal.\n" |
There was a problem hiding this comment.
Same above. Please don't translate back to English. No need to have same message in same language again.
| if matched_url is not None: | ||
| return matched_url | ||
| else: | ||
| video_player = post["video_url"] |
There was a problem hiding this comment.
Have you referred to api v2 doc. I did not see any video_url attributes for video posts.
There was a problem hiding this comment.
Should be post["player"], and match the url from the largest resolution (pick the largest value from post["player"][idx_CHANGEME]["width"] ).
| # if post has photoset, walk into photoset for each photo | ||
| photoset = post["photos"] | ||
| for photo in photoset: | ||
| self.queue.put((medium_type, photo["original_size"], target_folder)) |
There was a problem hiding this comment.
Seems wrong here. Please check out the api v2 doc.
| try: | ||
| if medium_type == "photo": | ||
| return post["photo-url"][0]["#text"] | ||
| return post["url"] |
There was a problem hiding this comment.
I didn't see any handlings for post["alt_sizes"], which would contain photos with different resolutions. The largest one should be picked up.
|
|
||
|
|
||
| # download like | ||
| def downloadTaskPreHandlerOfLikePost(self, medium_type, posts, target_folder): |
There was a problem hiding this comment.
should be refactored along with downloadTaskPreHandlerOfNormalPost
New Features: