Provide a ScrapingClient that doesn't need API access #5

pR0Ps · 2020-10-01T05:53:30Z

Also adds the ability to list activities using web scraping instead of the API. The activities are returned as ScrapedActivity objects that are mostly compatible with the normal Activity objects that are returned by the list activities function that uses the API.

Fixes #4

NOTE: stravalib moving to Pydantic for its models is going to break a LOT of this. Will need some work.

Also adds the ability to list activities using web scraping instead of the API. The activities are returned as `ScrapedActivity` objects that are mostly compatible with the normal `Activity` objects that are returned by the list activities function that uses the API.

This should be done by the library consumer if it's needed

It's not going to be perfect, but the idea is that for the most basic of cases it should be a pretty close replacement. The goal is to keep the amount of work to support both API and scraping-based clients to a minimum. To support this, the WebClient now uses delegation instead of inheritance to add scraper-based functionality. This enables the `ScrapingClient` class to use the same function names without automatically overriding the `stravalib.Client` functions when used through the `WebClient` class.

The default used to be to just download the JSON blob. It was changed to request the GPX format instead since this is a more standardized format for an activity.

Now accepts (but ignores) parameters that the `stravalib` version accepts

- Make pagination actually work (forgot to increment page number) - Handle stopping based on the `before` param - Properly handle workout types

- Move models to a separate file - Add more detailed scraping of activity details - Add more detailed scraping of bike data

Replaces `get_bike_components`

- Tweak LazyLoaded - Add scraping for challenges - Tweak gear access

BeautifulSoup v4.9.0 changed how `.text` works for `<script>` tags (ie. not at all), breaking parsing. See https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/revision/564

pR0Ps force-pushed the feature/standalone-scraping branch from b2f0204 to 923a1c3 Compare October 1, 2020 06:03

pR0Ps force-pushed the feature/standalone-scraping branch from 923a1c3 to 13737f7 Compare January 11, 2022 05:30

pR0Ps force-pushed the feature/standalone-scraping branch 2 times, most recently from 9a82176 to d8ed33a Compare January 31, 2022 19:55

pR0Ps marked this pull request as draft February 3, 2022 06:08

pR0Ps added 23 commits September 16, 2022 01:17

Add the ability to scrape photos

cc9e606

Remove caching for bike component scraping

5cab40d

This should be done by the library consumer if it's needed

Provide convenience functions for requesting data

7209586

Ensure scraping and API are accessing the same account

24eec45

Change default fallback for JSON activity downloads

f30882e

The default used to be to just download the JSON blob. It was changed to request the GPX format instead since this is a more standardized format for an activity.

Increase compatibility for get_activity_photos

26334b3

Now accepts (but ignores) parameters that the `stravalib` version accepts

Improve get_activites function

dac6cc3

- Make pagination actually work (forgot to increment page number) - Handle stopping based on the `before` param - Properly handle workout types

Refactor delete_activity to use request_post

3cb2f48

WIP

0e2edb1

- Move models to a separate file - Add more detailed scraping of activity details - Add more detailed scraping of bike data

Pull unicode_escapes out

99d0b77

Implement a replacement for get_activity

2ae1c3d

Add scraped components to Bikes returned from get_gear

030b3a3

Replaces `get_bike_components`

Implement a scraping-based get_gear function

c13b6bd

Use EntityCollection type for lists of entities

4eba61a

Refactor how lazy loading works

dde930b

Allow LazyLoaded Attributes to behave like properties

6b68efd

Add ScrapedAthlete

a22ea26

WIP

e5c8cd6

- Tweak LazyLoaded - Add scraping for challenges - Tweak gear access

Fix extracting data from script tags

856de24

BeautifulSoup v4.9.0 changed how `.text` works for `<script>` tags (ie. not at all), breaking parsing. See https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/revision/564

Fix trying to get more data on other athlete's bikes

f6d872c

Fix null locations serializing to 'None,None'

3685464

pR0Ps force-pushed the feature/standalone-scraping branch from d8ed33a to 3685464 Compare September 16, 2022 05:32

pR0Ps mentioned this pull request Jan 9, 2023

Snowboarding runs and vertical #13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provide a ScrapingClient that doesn't need API access #5

Provide a ScrapingClient that doesn't need API access #5

Uh oh!

pR0Ps commented Oct 1, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Provide a ScrapingClient that doesn't need API access #5

Are you sure you want to change the base?

Provide a ScrapingClient that doesn't need API access #5

Uh oh!

Conversation

pR0Ps commented Oct 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pR0Ps commented Oct 1, 2020 •

edited

Loading