Skip to content

Commit e6a9577

Browse files
authored
Merge branch 'master' into feat/agno-integration
2 parents abc37be + 89938bf commit e6a9577

File tree

21 files changed

+79
-78
lines changed

21 files changed

+79
-78
lines changed

.github/styles/config/vocabularies/Docs/accept.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,7 @@ preconfigured
8686

8787
[Mm]ultiselect
8888

89+
devs
8990
asyncio
9091
Langflow
9192
backlinks?

package-lock.json

Lines changed: 51 additions & 51 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

sources/academy/homepage_content.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"Beginner courses": [
33
{
4-
"title": "Web scraping for beginners",
4+
"title": "Web scraping basics for JavaScript devs",
55
"link": "academy/web-scraping-for-beginners",
66
"description": "Learn how to develop web scrapers on your own computer with open-source tools. This web scraping course teaches you all the basics a scraper developer needs to know.",
77
"imageUrl": "/img/academy/intro.svg"

sources/academy/platform/expert_scraping_with_apify/actors_webhooks.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Thus far, you've run Actors on the platform and written an Actor of your own, wh
1515

1616
## Advanced Actor overview {#advanced-actors}
1717

18-
In this course, we'll be working out of the Amazon scraper project from the **Web scraping for beginners** course. If you haven't already built that project, you can do it in three short lessons [here](../../webscraping/scraping_basics_javascript/challenge/index.md). We've made a few small modifications to the project with the Apify SDK, but 99% of the code is still the same.
18+
In this course, we'll be working out of the Amazon scraper project from the **Web scraping basics for JavaScript devs** course. If you haven't already built that project, you can do it in three short lessons [here](../../webscraping/scraping_basics_javascript/challenge/index.md). We've made a few small modifications to the project with the Apify SDK, but 99% of the code is still the same.
1919

2020
Take another look at the files within your Amazon scraper project. You'll notice that there is a **Dockerfile**. Every single Actor has a Dockerfile (the Actor's **Image**) which tells Docker how to spin up a container on the Apify platform which can successfully run the Actor's code. "Apify Actors" is a serverless platform that runs multiple Docker containers. For a deeper understanding of Actor Dockerfiles, refer to the [Apify Actor Dockerfile docs](/sdk/js/docs/guides/docker-images#example-dockerfile).
2121

@@ -41,7 +41,7 @@ Prior to moving forward, please read over these resources:
4141

4242
## Our task {#our-task}
4343

44-
In this task, we'll be building on top of what we already created in the [Web scraping for beginners](/academy/web-scraping-for-beginners/challenge) course's final challenge, so keep those files safe!
44+
In this task, we'll be building on top of what we already created in the [Web scraping basics for JavaScript devs](/academy/web-scraping-for-beginners/challenge) course's final challenge, so keep those files safe!
4545

4646
Once our Amazon Actor has completed its run, we will, rather than sending an email to ourselves, call an Actor through a webhook. The Actor called will be a new Actor that we will create together, which will take the dataset ID as input, then subsequently filter through all of the results and return only the cheapest one for each product. All of the results of the Actor will be pushed to its default dataset.
4747

sources/academy/platform/expert_scraping_with_apify/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,15 +18,15 @@ This course will teach you the nitty gritty of what it takes to build pro-level
1818

1919
Before developing a pro-level Apify scraper, there are some important things you should have at least a bit of knowledge about (knowing the basics of each is enough to continue through this section), as well as some things that you should have installed on your system.
2020

21-
> If you've already gone through the [Web scraping for beginners course](../../webscraping/scraping_basics_javascript/index.md) and the first courses of the [Apify platform category](../apify_platform.md), you will be more than well equipped to continue on with the lessons in this course.
21+
> If you've already gone through the [Web scraping basics for JavaScript devs](../../webscraping/scraping_basics_javascript/index.md) and the first courses of the [Apify platform category](../apify_platform.md), you will be more than well equipped to continue on with the lessons in this course.
2222
2323
<!-- ### Puppeteer/Playwright {#puppeteer-playwright}
2424
2525
[Puppeteer](https://pptr.dev/) is a library for running and controlling a [headless browser](../../webscraping/scraping_basics_javascript/crawling/headless_browser.md) in Node.js, and was developed at Google. The team working on it was hired by Microsoft to work on the [Playwright](https://playwright.dev/) project; therefore, many parallels can be seen between both the `puppeteer` and `playwright` packages. Proficiency in at least one of these will be good enough. -->
2626

2727
### Crawlee, Apify SDK, and the Apify CLI {#crawlee-apify-sdk-and-cli}
2828

29-
If you're feeling ambitious, you don't need to have any prior experience with Crawlee to get started with this course; however, at least 5–10 minutes of exposure is recommended. If you haven't yet tried out Crawlee, you can refer to [this lesson](../../webscraping/scraping_basics_javascript/crawling/pro_scraping.md) in the **Web scraping for beginners** course (and ideally follow along). To familiarize yourself with the Apify SDK, you can refer to the [Apify Platform](../apify_platform.md) category.
29+
If you're feeling ambitious, you don't need to have any prior experience with Crawlee to get started with this course; however, at least 5–10 minutes of exposure is recommended. If you haven't yet tried out Crawlee, you can refer to [this lesson](../../webscraping/scraping_basics_javascript/crawling/pro_scraping.md) in the **Web scraping basics for JavaScript devs** course (and ideally follow along). To familiarize yourself with the Apify SDK, you can refer to the [Apify Platform](../apify_platform.md) category.
3030

3131
The Apify CLI will play a core role in the running and testing of the Actor you will build, so if you haven't gotten it installed already, please refer to [this short lesson](../../glossary/tools/apify_cli.md).
3232

sources/academy/platform/get_most_of_actors/store_basics/how_store_works.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ Learn more about how to handle the [Issues tab](/academy/actor-marketing-playboo
104104
## Resources
105105

106106
- Best practices on setting up [testing for your Actor](https://docs.apify.com/platform/actors/publishing/test)
107-
- What are Apify-maintained and [Community-maintained Actors](https://help.apify.com/en/articles/6999799-what-are-apify-maintained-and-community-maintained-actors)? On ownership, maintenance, features, and support
107+
<!-- - What are Apify-maintained and [Community-maintained Actors](https://help.apify.com/en/articles/6999799-what-are-apify-maintained-and-community-maintained-actors)? On ownership, maintenance, features, and support -->
108108
- Step-by-step guide on how to [publish your Actor](https://docs.apify.com/platform/actors/publishing)
109109
- Watch our webinar on how to [build, publish and monetize Actors](https://www.youtube.com/watch?v=4nxStxC1BJM)
110110
- Detailed [guide on pricing models](https://docs.apify.com/platform/actors/running/actors-in-store) for Actors in Store

sources/academy/tutorials/node_js/submitting_forms_on_aspx_pages.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ slug: /node-js/submitting-forms-on-aspx-pages
77

88
Apify users sometimes need to submit a form on pages created with ASP.NET (URL typically ends with .aspx). These pages have a different approach for how they submit forms and navigate through pages.
99

10-
This tutorial shows you how to handle these kinds of pages. This approach is based on a [blog post](https://toddhayton.com/2015/05/04/scraping-aspnet-pages-with-ajax-pagination/) from Todd Hayton, where he explains how crawlers for ASP.NET pages should work.
10+
This tutorial shows you how to handle these kinds of pages. This approach is based on a [blog post](https://web.archive.org/web/20230530120937/https://toddhayton.com/2015/05/04/scraping-aspnet-pages-with-ajax-pagination/) from Todd Hayton, where he explains how crawlers for ASP.NET pages should work.
1111

1212
First of all, you need to copy&paste this function to your [Web Scraper](https://apify.com/apify/web-scraper) _Page function_:
1313

sources/academy/webscraping/advanced_web_scraping/crawling/sitemaps-vs-search.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
22
title: Sitemaps vs search
3-
description: Learn how to extract all of a website's listings even if they limit the number of results pages.
3+
description: Learn how to extract all of a website's listings even if they limit the number of results pages.
44
sidebar_position: 1
55
slug: /advanced-web-scraping/crawling/sitemaps-vs-search
66
---
77

8-
The core crawling problem comes to down to ensuring that we reliably find all detail pages on the target website or inside its categories. This is trivial for small sites. We just open the home page or category pages and paginate to the end as we did in the [Web Scraping for Beginners course](/academy/web-scraping-for-beginners).
8+
The core crawling problem comes to down to ensuring that we reliably find all detail pages on the target website or inside its categories. This is trivial for small sites. We just open the home page or category pages and paginate to the end as we did in the [Web scraping basics for JavaScript devs](/academy/web-scraping-for-beginners) course.
99

1010
Unfortunately, _most modern websites restrict pagination_ only to somewhere between 1 and 10,000 products. Solving this problem might seem relatively straightforward at first but there are multiple hurdles that we will explore in this lesson.
1111

0 commit comments

Comments
 (0)