Skip to content

Commit 6885f63

Browse files
committed
fix broken anchor links
1 parent 230b7c4 commit 6885f63

File tree

27 files changed

+125
-131
lines changed

27 files changed

+125
-131
lines changed

sources/academy/platform/deploying_your_code/input_schema.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ Each property's key corresponds to the name we're expecting within our code, whi
5353

5454
## Property types & editor types {#property-types}
5555

56-
Within our new **numbers** property, there are two more fields we must specify. Firstly, we must let the platform know that we're expecting an array of numbers with the **type** field. Then, we should also instruct Apify on which UI component to render for this input property. In our case, we have an array of numbers, which means we should use the **json** editor type that we discovered in the ["array" section](/platform/actors/development/actor-definition/input-schema#array) of the input schema documentation. We could also use **stringList**, but then we'd have to parse out the numbers from the strings.
56+
Within our new **numbers** property, there are two more fields we must specify. Firstly, we must let the platform know that we're expecting an array of numbers with the **type** field. Then, we should also instruct Apify on which UI component to render for this input property. In our case, we have an array of numbers, which means we should use the **json** editor type that we discovered in the ["array" section](/platform/actors/development/actor-definition/input-schema/specification/v1#array) of the input schema documentation. We could also use **stringList**, but then we'd have to parse out the numbers from the strings.
5757

5858
```json
5959
{

sources/academy/platform/expert_scraping_with_apify/bypassing_anti_scraping.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ You might have already noticed that we've been using the **RESIDENTIAL** proxy g
2020
## Learning 🧠 {#learning}
2121

2222
- Skim [this page](https://apify.com/proxy) for a general idea of Apify Proxy.
23-
- Give the [proxy documentation](/platform/proxy#our-proxies) a solid readover (feel free to skip most of the examples).
23+
- Give the [proxy documentation](/platform/proxy) a solid readover (feel free to skip most of the examples).
2424
- Check out the [anti-scraping guide](../../webscraping/anti_scraping/index.md).
2525
- Gain a solid understanding of the [SessionPool](https://crawlee.dev/api/core/class/SessionPool).
2626
- Look at a few Actors on the [Apify store](https://apify.com/store). How are they utilizing proxies?

sources/academy/platform/expert_scraping_with_apify/solutions/handling_migrations.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -231,7 +231,7 @@ That's everything! Now, even if the Actor migrates (or is gracefully aborted and
231231

232232
**A:** It's not best to use this option by default. If it fails, there must be a reason, which would need to be thought through first - meaning that the edge case of failing should be handled when resurrecting the Actor. The state should be persisted beforehand.
233233

234-
**Q: Migrations happen randomly, but by [aborting gracefully](/platform/actors/running#aborting-runs), you can simulate a similar situation. Try this out on the platform and observe what happens. What changes occur, and what remains the same for the restarted Actor's run?**
234+
**Q: Migrations happen randomly, but by [aborting gracefully](/platform/actors/running/runs-and-builds#aborting-runs), you can simulate a similar situation. Try this out on the platform and observe what happens. What changes occur, and what remains the same for the restarted Actor's run?**
235235

236236
**A:** After aborting or throwing an error mid-process, it manages to start back from where it was upon resurrection.
237237

sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Storage allows us to save persistent data for further processing. As you'll lear
2424
## Learning 🧠 {#learning}
2525

2626
- Check out [the docs about Actor tasks](/platform/actors/running/tasks).
27-
- Read about the [two main storage options](/platform/storage#dataset) on the Apify platform.
27+
- Read about the [two main storage options](/platform/storage/dataset) on the Apify platform.
2828
- Understand the [crucial differences between named and unnamed storages](/platform/storage/usage#named-and-unnamed-storages).
2929
- Learn about the [`Dataset`](/sdk/js/reference/class/Dataset) and [`KeyValueStore`](/sdk/js/reference/class/KeyValueStore) objects in the Apify SDK.
3030

sources/academy/platform/getting_started/inputs_outputs.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ Then, replace everything in **INPUT_SCHEMA.json** with this:
6565
}
6666
```
6767

68-
> If you're interested in learning more about how the code works, and what the **INPUT_SCHEMA.json** means, read about [inputs](/sdk/js/docs/examples/accept-user-input) and [adding data to a dataset](/sdk/js/docs/examples/add-data-to-dataset) in the Apify SDK documentation, and refer to the [input schema docs](/platform/actors/development/actor-definition/input-schema#integer).
68+
> If you're interested in learning more about how the code works, and what the **INPUT_SCHEMA.json** means, read about [inputs](/sdk/js/docs/examples/accept-user-input) and [adding data to a dataset](/sdk/js/docs/examples/add-data-to-dataset) in the Apify SDK documentation, and refer to the [input schema docs](/platform/actors/development/actor-definition/input-schema/specification/v1#integer).
6969
7070
Finally, **Save** and **Build** the Actor just as you did in the previous lesson.
7171

@@ -89,7 +89,7 @@ On the results tab, there are a whole lot of options for which format to view/do
8989

9090
There's our solution! Did it work for you as well? Now, we can download the data right from the results tab to be used elsewhere, or even programmatically retrieve it by using [Apify's API](/api/v2) (we'll be discussing how to do this in the next lesson).
9191

92-
It's important to note that the default dataset of the Actor, which we pushed our solution to, will be retained for 7 days. If we wanted the data to be retained for an indefinite period of time, we'd have to use a named dataset. For more information about named storages vs unnamed storages, read a bit about [data retention on the Apify platform](/platform/storage#data-retention).
92+
It's important to note that the default dataset of the Actor, which we pushed our solution to, will be retained for 7 days. If we wanted the data to be retained for an indefinite period of time, we'd have to use a named dataset. For more information about named storages vs unnamed storages, read a bit about [data retention on the Apify platform](/platform/storage/usage#data-retention).
9393

9494
## Next up {#next}
9595

sources/academy/tutorials/apify_scrapers/cheerio_scraper.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ tutorial, great! You are ready to continue where we left off. If you haven't see
1717
check it out, it will help you learn about Apify and scraping in general and set you up for this tutorial,
1818
because this one builds on topics and code examples discussed there.
1919

20-
## [](#getting-to-know-our-tools) Getting to know our tools
20+
## Getting to know our tools
2121

2222
In the [Getting started with Apify scrapers](/academy/apify-scrapers/getting-started) tutorial, we've confirmed that the scraper works as expected,
2323
so now it's time to add more data to the results.
@@ -36,7 +36,7 @@ Now that's out of the way, let's open one of the Actor detail pages in the Store
3636
> If you're wondering why we're using Web Scraper as an example instead of Cheerio Scraper,
3737
it's only because we didn't want to triple the number of screenshots we needed to make. Lazy developers!
3838

39-
## [](#building-our-page-function) Building our Page function
39+
## Building our Page function
4040

4141
Before we start, let's do a quick recap of the data we chose to scrape:
4242

@@ -52,7 +52,7 @@ Before we start, let's do a quick recap of the data we chose to scrape:
5252
We've already scraped numbers 1 and 2 in the [Getting started with Apify scrapers](/academy/apify-scrapers/getting-started)
5353
tutorial, so let's get to the next one on the list: title.
5454

55-
### [](#title) Title
55+
### Title
5656

5757
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/title.webp)
5858

@@ -79,7 +79,7 @@ async function pageFunction(context) {
7979
}
8080
```
8181

82-
### [](#description) Description
82+
### Description
8383

8484
Getting the Actor's description is a little more involved, but still pretty straightforward. We can't just simply search for a `<p>` tag, because
8585
there's a lot of them in the page. We need to narrow our search down a little. Using the DevTools we find that the Actor description is nested within
@@ -98,7 +98,7 @@ async function pageFunction(context) {
9898
}
9999
```
100100

101-
### [](#modified-date) Modified date
101+
### Modified date
102102

103103
The DevTools tell us that the `modifiedDate` can be found in a `<time>` element.
104104

@@ -126,7 +126,7 @@ But we would much rather see a readable date in our results, not a unix timestam
126126
constructor will not accept a `string`, so we cast the `string` to a `number` using the `Number()` function before actually calling `new Date()`.
127127
Phew!
128128

129-
### [](#run-count) Run count
129+
### Run count
130130

131131
And so we're finishing up with the `runCount`. There's no specific element like `<time>`, so we need to create
132132
a complex selector and then do a transformation on the result.
@@ -165,7 +165,7 @@ using a regular expression, but its type is still a `string`, so we finally conv
165165
>
166166
> This will give us a string (e.g. `'1234567'`) that can be converted via `Number` function.
167167
168-
### [](#wrapping-it-up) Wrapping it up
168+
### Wrapping it up
169169

170170
And there we have it! All the data we needed in a single object. For the sake of completeness, let's add
171171
the properties we parsed from the URL earlier and we're good to go.
@@ -243,13 +243,13 @@ async function pageFunction(context) {
243243
}
244244
```
245245

246-
### [](#test-run) Test run
246+
### Test run
247247

248248
As always, try hitting that **Save & Run** button and visit
249249
the **Dataset** preview of clean items. You should see a nice table of all the attributes correctly scraped.
250250
You nailed it!
251251

252-
## [](#pagination) Pagination
252+
## Pagination
253253

254254
Pagination is just a term that represents "going to the next page of results". You may have noticed that we did not
255255
actually scrape all the Actors, just the first page of results. That's because to load the rest of the Actors,
@@ -265,7 +265,7 @@ with Cheerio? We don't have a browser to do it and we only have the HTML of the
265265
answer is that we can't click a button. Does that mean that we cannot get the data at all? Usually not,
266266
but it requires some clever DevTools-Fu.
267267

268-
### [](#analyzing-the-page) Analyzing the page
268+
### Analyzing the page
269269

270270
While with Web Scraper and **Puppeteer Scraper** ([apify/puppeteer-scraper](https://apify.com/apify/puppeteer-scraper)), we could get away with simply clicking a button,
271271
with Cheerio Scraper we need to dig a little deeper into the page's architecture. For this, we will use
@@ -281,7 +281,7 @@ Then we click the **Show more** button and wait for incoming requests to appear
281281
Now, this is interesting. It seems that we've only received two images after clicking the button and no additional
282282
data. This means that the data about Actors must already be available in the page and the **Show more** button only displays it. This is good news.
283283

284-
### [](#finding-the-actors) Finding the Actors
284+
### Finding the Actors
285285

286286
Now that we know the information we seek is already in the page, we just need to find it. The first Actor in the store
287287
is Web Scraper, so let's try using the search tool in the **Elements** tab to find some reference to it. The first
@@ -310,7 +310,7 @@ so you might already be wondering, can I just make one request to the store to g
310310
and then parse it out and be done with it in a single request? Yes you can! And that's the power
311311
of clever page analysis.
312312

313-
### [](#using-the-data-to-enqueue-all-actor-details) Using the data to enqueue all Actor details
313+
### Using the data to enqueue all Actor details
314314

315315
We don't really need to go to all the Actor details now, but for the sake of practice, let's imagine we only found
316316
Actor names such as `cheerio-scraper` and their owners, such as `apify` in the data. We will use this information
@@ -343,7 +343,7 @@ how to route those requests.
343343
>If you're wondering how we know the structure of the URL, see the [Getting started
344344
with Apify Scrapers](./getting_started.md) tutorial again.
345345

346-
### [](#plugging-it-into-the-page-function) Plugging it into the Page function
346+
### Plugging it into the Page function
347347

348348
We've got the general algorithm ready, so all that's left is to integrate it into our earlier `pageFunction`.
349349
Remember the `// Do some stuff later` comment? Let's replace it.
@@ -412,13 +412,13 @@ to get all results with Cheerio only and other times it takes hours of research.
412412
the right scraper for your job. But don't get discouraged. Often times, the only thing you will ever need is to
413413
define a correct Pseudo URL. Do your research first before giving up on Cheerio Scraper.
414414

415-
## [](#downloading-our-scraped-data) Downloading the scraped data
415+
## Downloading the scraped data
416416

417417
You already know the **Dataset** tab of the run console since this is where we've always previewed our data. Notice the row of data formats such as JSON, CSV, and Excel. Below it are options for viewing and downloading the data. Go ahead and try it.
418418

419419
> If you prefer working with an API, you can find the example endpoint under the API tab: **Get dataset items**.
420420
421-
### [](#clean-items) Clean items
421+
### Clean items
422422

423423
You can view and download your data without modifications, or you can choose to only get **clean** items. Data that aren't cleaned include a record
424424
for each `pageFunction` invocation, even if you did not return any results. The record also includes hidden fields
@@ -428,7 +428,7 @@ Clean items, on the other hand, include only the data you returned from the `pag
428428

429429
To control this, open the **Advanced options** view on the **Dataset** tab.
430430

431-
## [](#bonus-making-your-code-neater) Bonus: Making your code neater
431+
## Bonus: Making your code neater
432432

433433
You may have noticed that the `pageFunction` gets quite bulky. To make better sense of your code and have an easier
434434
time maintaining or extending your task, feel free to define other functions inside the `pageFunction`
@@ -496,11 +496,11 @@ async function pageFunction(context) {
496496
> If you're confused by the functions being declared below their executions, it's called hoisting and it's a feature
497497
of JavaScript. It helps you put what matters on top, if you so desire.
498498

499-
## [](#final-word) Final word
499+
## Final word
500500

501501
Thank you for reading this whole tutorial! Really! It's important to us that our users have the best information available to them so that they can use Apify easily and effectively. We're glad that you made it all the way here and congratulations on creating your first scraping task. We hope that you liked the tutorial and if there's anything you'd like to ask, [join us on Discord](https://discord.gg/jyEM2PRvMU)!
502502

503-
## [](#whats-next) What's next
503+
## What's next
504504

505505
* Check out the [Apify SDK](https://sdk.apify.com/) and its [Getting started](https://sdk.apify.com/docs/guides/getting-started) tutorial if you'd like to try building your own Actors. It's a bit more complex and involved than writing a simple `pageFunction`, but it allows you to fine-tune all the details of your scraper to your liking.
506506
* [Take a deep dive into Actors](/platform/actors), from how they work to [publishing](/platform/actors/publishing) them in Apify Store, and even [making money](https://blog.apify.com/make-regular-passive-income-developing-web-automation-actors-b0392278d085/) on Actors.

0 commit comments

Comments
 (0)