lukas comments

mstephen19 · mstephen19 · commit 9eff9b0df4f2 · 2022-09-27T11:25:34.000+02:00
diff --git a/content/academy/analyzing_pages_and_fixing_errors.md b/content/academy/analyzing_pages_and_fixing_errors.md
@@ -9,7 +9,7 @@ paths:
 
 # [](#scraping-with-sitemaps) Analyzing a page and fixing errors
 
-Debugging is absolutely essential in programming. Even if you don't call yourself a programmer, having basic debugging skills will make building crawlers easier. It will also help you safe money my allowing you to avoid hiring an expensive developer to solve your issue for you.
+Debugging is absolutely essential in programming. Even if you don't call yourself a programmer, having basic debugging skills will make building crawlers easier. It will also help you safe money by allowing you to avoid hiring an expensive developer to solve your issue for you.
 
 This quick lesson covers the absolute basics by discussing some of the most common problems and the simplest tools for analyzing and fixing them.
 
@@ -65,8 +65,7 @@ try {
     // ...
 } catch (error) {
     // You know where the code crashed so you can explain here
-    console.error('Request failed during login with an error:');
-    throw error;
+    throw new Error('Request failed during login with an error', { cause: error });
 }
 ```
 
@@ -108,8 +107,7 @@ try {
     const screenshotLink = `https://api.apify.com/v2/key-value-stores/${storeId}/records/${key}.jpg`
 
     // You know where the code crashed so you can explain here
-    console.error(`Request failed during login with an error. Screenshot: ${screenshotLink}`);
-    throw error;
+    throw new Error('Request failed during login with an error', { cause: error });
 }
 // ...
 ```
@@ -125,8 +123,9 @@ To make the error snapshot descriptive, we name it **ERROR-LOGIN**. We add a ran
 
 Logging and snapshotting are great tools but once you reach a certain run size, it may be hard to read through them all. For a large project, it is handy to create a more sophisticated reporting system. For example, let's just look at simple **dataset** reporting.
 
-<!-- TODO: Make the code example below make sense without using Apify API or SDK -->
-<!-- This example extends our snapshot solution above by creating a [named dataset](https://docs.apify.com/storage#named-and-unnamed-storages) (named datasets have infinite retention), where we will accumulate error reports. Those reports will explain what happened and will link to a saved snapshot, so we can do a quick visual check.
+## [](#with-the-apify-sdk) With the Apify SDK
+
+This example extends our snapshot solution above by creating a [named dataset](https://docs.apify.com/storage#named-and-unnamed-storages) (named datasets have infinite retention), where we will accumulate error reports. Those reports will explain what happened and will link to a saved snapshot, so we can do a quick visual check.
 
 ```JavaScript
 import { Actor } from 'apify';
@@ -172,11 +171,8 @@ try {
     await reportingDataset.pushData(report);
 
     // You know where the code crashed so you can explain here
-    console.error(
-        `Request failed during login with an error. Screenshot: ${screenshotLink}`
-    );
-    throw error;
+    throw new Error('Request failed during login with an error', { cause: error });
 }
 // ...
 await Actor.exit();
-``` -->
+```
diff --git a/content/academy/caching_responses_in_puppeteer.md b/content/academy/caching_responses_in_puppeteer.md
@@ -9,9 +9,9 @@ paths:
 
 # [](#caching-responses-in-puppeteer) Caching responses in Puppeteer
 
-> In the latest version of Puppeteer, the request-interception function inconveniently disables the native cache and significantly slows down the crawler. Therefore, it's not recommended to follow the examples shown in this article. Puppeteer now uses a native cache that should work well enough for most use cases.
+> In the latest version of Puppeteer, the request-interception function inconveniently disables the native cache and significantly slows down the crawler. Therefore, it's not recommended to follow the examples shown in this article unless you have a very specific use-case where the default browser cache is not enough (e.g. cashing over multiple scraper runs)
 
-When running crawlers that go through a single website, each open page has to load all resources again (sadly, headless browsers don't use cache). The problem is that each resource needs to be downloaded through the network, which can be slow and/or unstable (especially when proxies are used).
+When running crawlers that go through a single website, each open page has to load all resources again. The problem is that each resource needs to be downloaded through the network, which can be slow and/or unstable (especially when proxies are used).
 
 For this reason, in this article, we will take a look at how to use memory to cache responses in Puppeteer (only those that contain header **cache-control** with **max-age** above **0**).
 
@@ -155,14 +155,6 @@ const crawler = new PuppeteerCrawler({
             succeeded: true,
         });
     },
-
-    failedRequestHandler: async ({ request }) => {
-        await Dataset.pushData({
-            url: request.url,
-            succeeded: false,
-            errors: request.errorMessages,
-        });
-    },
 });
 
 await crawler.run(['https://apify.com/store', 'https://apify.com']);
diff --git a/content/academy/optimizing_scrapers.md b/content/academy/optimizing_scrapers.md
@@ -9,7 +9,7 @@ paths:
 
 # [](#optimizing-scrapers) Optimizing scrapers
 
-Especially if you are running your scrapers on [Apify](https://apify.com), performance is directly related to your wallet (or rather bank account). The slower and heavier your program is, the more [compute units](https://help.apify.com/en/articles/3490384-what-is-a-compute-unit) and higher [subscription plan](https://apify.com/pricing) you'll need.
+Especially if you are running your scrapers on [Apify](https://apify.com), performance is directly related to your wallet (or rather bank account). The slower and heavier your program is, the more proxy bandwidth, storage, [compute units](https://help.apify.com/en/articles/3490384-what-is-a-compute-unit) and higher [subscription plan](https://apify.com/pricing) you'll need.
 
 The goal of optimization is simple: Make the code run as fast possible and use the least resources possible. On Apify, the resources are memory and CPU usage (don't forget that the more memory you allocate to a run, the bigger share of CPU you get - proportionally). Memory alone should never be a bottleneck though. If it is, that means either a bug (memory leak) or bad architecture of the program (you need to split the computation to smaller parts). So in the rest of this article, we will focus only on optimizing CPU usage. You allocate more memory only to get more power from the CPU.
 
@@ -29,7 +29,7 @@ Now, if you want to build your own game and you are not a C/C++ veteran with a t
 
 What are the engines of the scraping world? A [browser](https://github.com/puppeteer/puppeteer/blob/master/docs/api.md), an [HTTP library](https://www.npmjs.com/package/@apify/http-request), an [HTML parser](https://github.com/cheeriojs/cheerio), and a [JSON parser](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/parse). The CPU spends more than 99% of its workload in these libraries. As with engines, you are not likely gonna write these from scratch - instead you'll use something like [Crawlee](https://crawlee.dev) that handles a lot of the overheads for you.
 
-It is about how you use these tools. The small amount of code you write in your [`requestHandler`](https://crawlee.dev/api/http-crawler/interface/HttpCrawlerOptions#requestHandler) is absolutely insignificant compared to what is running inside these tools. In other words, it doesn't matter how many functions you call or how many variables you extract. If you want to optimize your scrapers, you need to choose the lightweight option from the tools and use it as little as possible. A crawler scraping only JSON API can be as much as 50 times faster/cheaper than a browser based solution.
+It is about how you use these tools. The small amount of code you write in your [`requestHandler`](https://crawlee.dev/api/http-crawler/interface/HttpCrawlerOptions#requestHandler) is absolutely insignificant compared to what is running inside these tools. In other words, it doesn't matter how many functions you call or how many variables you extract. If you want to optimize your scrapers, you need to choose the lightweight option from the tools and use it as little as possible. A crawler scraping only JSON API can be as much as 200 times faster/cheaper than a browser based solution.
 
 **Ranking of the tools from the most efficient to the least:**