-
Notifications
You must be signed in to change notification settings - Fork 0
Utilities
This page will discuss all the scraping utilities that Kryptone offers.
The URL class transforms a simple url string in order to add various attributes to facilitate web scraping.
Checks if the url is a path e.g.g starts with /
Checks if the url is valid e.g. starts with http or https
Checks if the url has a fragment e.g. http://example.com#fashion
Checks if the url points to a file
Returns the url as pathlib.Path
Return the extension part of the url e.g. .jpeg of http://example.com/image.jpeg
Return the stem of the url e.g. image.jpeg of http://example.com/image.jpeg
Cheks if the url uses https
Creates a new url instance
Check if two url instances uses the same domain
Sends a request to the url (using requests.Request) and returns the status
Compares two url instances e.g. URL('http://example.com').compare(URL('http://example.com/1')
Capture a specific section of an url
url = URL('http://example.com/1')
url.capture(r'\d+')Test if a section of the whole url passes the given test
url = URL('http://example.com/1')
url.capture(r'\d+')
> TrueTest if a section of the url path alone passes the given test
url = URL('http://example.com/1')
url.capture(r'\d+')
> TrueDecompose the url path
url = URL('http://example.com/products/1')
url.decompose()
> ['products', '1']Certain elements can also be excluded:
url = URL('http://example.com/products/1')
url.decompose(exclude=['products'])
> ['1']test