You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sources/academy/webscraping/scraping_basics_python/13_platform.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,12 +12,12 @@ import Exercises from './_exercises.mdx';
12
12
13
13
---
14
14
15
-
Before starting with a scraping platform, let's point out several caveats in our current solution:
15
+
Before starting with a scraping platform, let's highlight a few caveats in our current solution:
16
16
17
-
-**User-operated:** We have to run the scraper ourselves. If we're interested in price trends, we'd have to remember to run the program every day. If we want to be notified about a big discount, having a program we need to run manually isn't much of an improvement over manually opening the web page in our browser every day.
18
-
-**No monitoring:** If we have a spare server or a RapsberryPi under table, we could use [cron](https://en.wikipedia.org/wiki/Cron) to schedule the program, but even then we'd have little visibility into whether it finished successfully, what errors or warnings occur, how long it runs, or what resources it consumes.
19
-
-**Manual data management:** To keep track of prices in time, we'd have to figure out a way how to organize the exported data. If we wanted to process the data, we might find out that different data analysis tools require different formats.
20
-
-**Prone to anti-scraping:** If the target website detects we're scraping their data, they can rate-limit or even block us. We could take a laptop to a nearby coffee place and run the program while connected to their public wi-fi, but eventually they'll probably block that one too, which puts you in a serious hazard of angrying your barista.
17
+
-**User-operated:** We have to run the scraper ourselves. If we're interested in price trends, we'd have to remember to run the program every day. If we want the program to alert us about a big discount, having to run it manually isn't much better than just opening the web page in our browser every day.
18
+
-**No monitoring:** If we have a spare server or a Raspberry Pi under the table, we could use [cron](https://en.wikipedia.org/wiki/Cron) to schedule the program. But even then, we'd have little visibility into whether it finished successfully, what errors or warnings occurred, how long it ran, or what resources it consumed.
19
+
-**Manual data management:** To track prices over time, we'd have to figure out how to organize the exported data. If we wanted to process the data, we might discover that different data analysis tools require specific formats.
20
+
-**Prone to anti-scraping:** If the target website detects we're scraping their data, they can rate-limit or even block us. We could take a laptop to a nearby coffee shop and run the program while connected to their public Wi-Fi, but eventually they'll probably block that one too—risking seriously annoying your barista.
0 commit comments