Skip to content

Crawler project: fix error caused by aiohttp update that use URL object replace url str#259

Open
gasxia wants to merge 1 commit intoaosabook:masterfrom
gasxia:adapt-aiohttp-update
Open

Crawler project: fix error caused by aiohttp update that use URL object replace url str#259
gasxia wants to merge 1 commit intoaosabook:masterfrom
gasxia:adapt-aiohttp-update

Conversation

@gasxia
Copy link

@gasxia gasxia commented May 9, 2017

Clawler project use aiohttp version 0.21.But now aiohttp version is 2.0.7
Since aiohttp 1.1 the library uses yarl for URL processing, instead of url string.
That result in some str parsing errors.

So must change code to adapt the aiohttp new version.

use URL.human_repr() to print a humanable string.
change response.url type to fix error.
Line 146 File crawling.py:

                if urls:
                    LOGGER.info('got %r distinct urls from %r',
                                len(urls), response.url.human_repr())
                for url in urls:
                    normalized = urllib.parse.urljoin(str(response.url), url)
                    defragmented, frag = urllib.parse.urldefrag(normalized)
                    if self.url_allowed(defragmented):
                        links.add(defragmented)

change _stat.url type to fix error.
Line 32 File reporting.py

show.sort(key=lambda _stat: str(_stat.url)) 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant