raw images first by liyiecho · Pull Request #76 · dixudx/tumblr-crawler

liyiecho · 2018-03-19T04:24:43Z

Fixes: #65

dixudx

Remove the redundant binary file.

dixudx · 2018-03-19T04:38:01Z

tumblr-photo-video-ripper.py

        try:
            medium_url = self._handle_medium_url(medium_type, post)
+            medium_url_bak = medium_url
+            medium_url =re.sub(u'[^/]*media.tumblr.com', u'data.tumblr.com', medium_url)


Why changing to this?

dixudx · 2018-03-19T04:42:22Z

tumblr-photo-video-ripper.py

            medium_url = self._handle_medium_url(medium_type, post)
+            medium_url_bak = medium_url
+            medium_url =re.sub(u'[^/]*media.tumblr.com', u'data.tumblr.com', medium_url)
+            if (b'_100.' in medium_url):


I don't like this exhaustive way. Hard coded is not a good choice.

Why not splitting the string and replacing with raw instead?

And you should not change at here. Only photos/images are applicable with raw.

Method _download(**) is the right place.

dixudx · 2018-03-19T07:45:28Z

tumblr-photo-video-ripper.py

            medium_url = self._handle_medium_url(medium_type, post)
-            if medium_url is not None:
-                self._download(medium_type, medium_url, target_folder)
+            #print("medium url is %s", medium_url)


Remove this comment line.

dixudx · 2018-03-19T09:16:56Z

tumblr-photo-video-ripper.py

+                self._download(medium_type, medium_url, target_folder, resp_raw)
+            elif medium_type == "photo":
+                medium_url_bak = medium_url
+                medium_url_dot = medium_url.split('.')


The url parsing here seems complex and error-prone.

Below part is a better way. WDYT?

def download(self, medium_type, post, target_folder): try: medium_url = self._handle_medium_url(medium_type, post) if medium_url is not None: if medium_type == "photo": try: # try to download raw image medium_url_raw = medium_url.replace("68.media.tumblr.com", "data.tumblr.com") raw_matched = self.hd_photo_regex.match(medium_url_raw) if raw_matched is not None: replace_raw = raw_matched.groups()[0] replace_raw = replace_raw.replace(raw_matched.groups()[1], "raw") medium_url_raw = medium_url_raw.replace(raw_matched.groups()[0], replace_raw) self._download(medium_type, medium_url_raw, target_folder) return except: pass self._download(medium_type, medium_url, target_folder) except TypeError: pass # can register differnet regex match rules def _register_regex_match_rules(self): # will iterate all the rules # the first matched result will be returned self.regex_rules = [video_hd_match(), video_default_match()] self.hd_photo_regex = re.compile(r".*(tumblr_\w+_(\d+))", re.IGNORECASE)

liyiecho · 2018-03-19T09:55:04Z

medium_url_raw = medium_url.replace("68.media.tumblr.com", "data.tumblr.com")
It doesn't always 68.media.tumblr.com

dixudx · 2018-03-19T13:39:24Z

It doesn't always 68.media.tumblr.com

@liyiecho So just use regex to match and replace it.

Delete Extra

4f925df

dixudx requested changes Mar 19, 2018

View reviewed changes

liyiecho added 2 commits March 19, 2018 14:33

raw images

755ae29

fix url error

6cbe325

dixudx requested changes Mar 19, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raw images first#76

raw images first#76
liyiecho wants to merge 3 commits intodixudx:masterfrom
liyiecho:raw_images

liyiecho commented Mar 19, 2018 •

edited by dixudx

Loading

Uh oh!

dixudx left a comment

Uh oh!

dixudx Mar 19, 2018

Uh oh!

dixudx Mar 19, 2018

Uh oh!

dixudx Mar 19, 2018

Uh oh!

dixudx Mar 19, 2018

Uh oh!

dixudx Mar 19, 2018

Uh oh!

liyiecho commented Mar 19, 2018

Uh oh!

dixudx commented Mar 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

liyiecho commented Mar 19, 2018 • edited by dixudx Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dixudx left a comment

Choose a reason for hiding this comment

Uh oh!

dixudx Mar 19, 2018

Choose a reason for hiding this comment

Uh oh!

dixudx Mar 19, 2018

Choose a reason for hiding this comment

Uh oh!

dixudx Mar 19, 2018

Choose a reason for hiding this comment

Uh oh!

dixudx Mar 19, 2018

Choose a reason for hiding this comment

Uh oh!

dixudx Mar 19, 2018

Choose a reason for hiding this comment

Uh oh!

liyiecho commented Mar 19, 2018

Uh oh!

dixudx commented Mar 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

liyiecho commented Mar 19, 2018 •

edited by dixudx

Loading