Skip to content

Commit d3406ef

Browse files
honzajavorekTC-MO
andauthored
feat: publish new JS course (#1907)
The aim of this PR is to publish the new JS course as described in the PR description of #1584, and to unlist the old JS course. The old one should be still accessible for a grace period. _Replacing the old JS course with a new one, which is identical to the Python course, has been previously sanctioned by both Ondra and Michał._ ### The Plan - [x] The `scraping_basics_javascript` root leads to the new JS course. - [x] The pages of the old JS course move to `legacy/web-scraping-for-beginners`. It's gonna be a read-only archive. Must be `noindex` to avoid cannibalization issues. - [x] The `web-scraping-for-beginners`, i.e. the root of the old JS course URLs, leads to redirects which take people to corresponding pages in the new JS course. This lets us use the SEO juice from the old URLs. - [x] The redirects add `#old-js-course` to the URL. The new JS course pages contain a component which, if `#old-js-course` is present in the URL, displays a _commemorative plaque_ about the change and link the old JS course. This improves UX: "Hey, you have until 1.1.2026 to go through this course. After that please refer to the newly updated JS course <link>." - [ ] At some point in future, we'll nuke the archive of the old JS course and link Internet Archive instead in the _commemorative plaque_. _The Plan is a result of a [long discussion between Michał, Aleš, and me](https://pyvec.slack.com/archives/C03BHBQNNG3/p1756992893312119), which takes into account both the UX of existing users of the JS course and SEO._ ### Related Work - Depends on #1889 - Closes #1584 - Closes #1579 - Fixes #947 - Discovered #1900 - Closes #2009 (PoC) - Contains #2023 - Closes #1550 <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Publishes the new JS course, archives the old one with redirects and an on-page notice, and updates links, content, and Nginx rewrites across the docs. > > - **Academy: New JS course rollout** > - Publishes `academy/webscraping/scraping_basics_javascript/*` (new slugs, content, and index) and updates internal references to it. > - Archives the old JS course under `academy/webscraping/scraping_basics_legacy/*` with `noindex` and a legacy notice. > - Adds `src/components/LegacyJsCourseAdmonition.jsx` and integrates it into new course pages to show a notice when `?legacy-js-course=` is present. > - Updates course metadata (titles/sidebar labels) in Expert/Anti‑scraping lessons and adds caution notes where content depends on the legacy course. > - Updates homepage card and other references to point to `'/academy/scraping-basics-javascript'`. > - **Routing/Redirects (Nginx)** > - Redirects old JS course paths `^/academy/web-scraping-for-beginners...` to `'/academy/scraping-basics-javascript'` with `?legacy-js-course=...`. > - Adds other redirects (e.g., output-schema → dataset-schema, academy php path, advanced web scraping path fix). > - **Content/link maintenance** > - Repoints numerous lessons to new paths (e.g., tutorials, Puppeteer/Playwright, advanced courses) and updates sample URLs in integrations (Make) to the new JS course. > - Minor copy/heading tweaks (e.g., RPA title), and consistent slug/slug changes across documents. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 2840ebd. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Michał Olender <[email protected]>
1 parent 336cec3 commit d3406ef

File tree

117 files changed

+449
-209
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

117 files changed

+449
-209
lines changed

nginx.conf

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -534,8 +534,36 @@ server {
534534
rewrite ^/platform/actors/development/source-code$ /platform/actors/development/deployment/source-types redirect;
535535

536536
# Academy restructuring
537-
rewrite ^academy/advanced-web-scraping/scraping-paginated-sites$ /academy/advanced-web-scraping/crawling/crawling-with-search permanent;
538-
rewrite ^academy/php$ /academy/php/use-apify-from-php redirect; # not permanent in case we want to reuse /php in the future
537+
rewrite ^/academy/advanced-web-scraping/scraping-paginated-sites$ /academy/advanced-web-scraping/crawling/crawling-with-search permanent;
538+
rewrite ^/academy/php$ /academy/php/use-apify-from-php redirect; # not permanent in case we want to reuse /php in the future
539+
540+
# Academy: replacing the 'Web Scraping for Beginners' course
541+
rewrite ^/academy/web-scraping-for-beginners/best-practices$ /academy/scraping-basics-javascript?legacy-js-course=/best-practices permanent;
542+
rewrite ^/academy/web-scraping-for-beginners/introduction$ /academy/scraping-basics-javascript?legacy-js-course=/introduction permanent;
543+
rewrite ^/academy/web-scraping-for-beginners/challenge/initializing-and-setting-up$ /academy/scraping-basics-javascript?legacy-js-course=/challenge/initializing-and-setting-up permanent;
544+
rewrite ^/academy/web-scraping-for-beginners/challenge/modularity$ /academy/scraping-basics-javascript?legacy-js-course=/challenge/modularity permanent;
545+
rewrite ^/academy/web-scraping-for-beginners/challenge/scraping-amazon$ /academy/scraping-basics-javascript?legacy-js-course=/challenge/scraping-amazon permanent;
546+
rewrite ^/academy/web-scraping-for-beginners/challenge$ /academy/scraping-basics-javascript?legacy-js-course=/challenge permanent;
547+
rewrite ^/academy/web-scraping-for-beginners/crawling/exporting-data$ /academy/scraping-basics-javascript/framework?legacy-js-course=/crawling/exporting-data permanent;
548+
rewrite ^/academy/web-scraping-for-beginners/crawling/filtering-links$ /academy/scraping-basics-javascript/getting-links?legacy-js-course=/crawling/filtering-links permanent;
549+
rewrite ^/academy/web-scraping-for-beginners/crawling/finding-links$ /academy/scraping-basics-javascript/getting-links?legacy-js-course=/crawling/finding-links permanent;
550+
rewrite ^/academy/web-scraping-for-beginners/crawling/first-crawl$ /academy/scraping-basics-javascript/crawling?legacy-js-course=/crawling/first-crawl permanent;
551+
rewrite ^/academy/web-scraping-for-beginners/crawling/headless-browser$ /academy/scraping-basics-javascript?legacy-js-course=/crawling/headless-browser permanent;
552+
rewrite ^/academy/web-scraping-for-beginners/crawling/pro-scraping$ /academy/scraping-basics-javascript/framework?legacy-js-course=/crawling/pro-scraping permanent;
553+
rewrite ^/academy/web-scraping-for-beginners/crawling/recap-extraction-basics$ /academy/scraping-basics-javascript/extracting-data?legacy-js-course=/crawling/recap-extraction-basics permanent;
554+
rewrite ^/academy/web-scraping-for-beginners/crawling/relative-urls$ /academy/scraping-basics-javascript/getting-links?legacy-js-course=/crawling/relative-urls permanent;
555+
rewrite ^/academy/web-scraping-for-beginners/crawling/scraping-the-data$ /academy/scraping-basics-javascript/scraping-variants?legacy-js-course=/crawling/scraping-the-data permanent;
556+
rewrite ^/academy/web-scraping-for-beginners/crawling$ /academy/scraping-basics-javascript/crawling?legacy-js-course=/crawling permanent;
557+
rewrite ^/academy/web-scraping-for-beginners/data-extraction/browser-devtools$ /academy/scraping-basics-javascript/devtools-inspecting?legacy-js-course=/data-extraction/browser-devtools permanent;
558+
rewrite ^/academy/web-scraping-for-beginners/data-extraction/computer-preparation$ /academy/scraping-basics-javascript/downloading-html?legacy-js-course=/data-extraction/computer-preparation permanent;
559+
rewrite ^/academy/web-scraping-for-beginners/data-extraction/devtools-continued$ /academy/scraping-basics-javascript/devtools-extracting-data?legacy-js-course=/data-extraction/devtools-continued permanent;
560+
rewrite ^/academy/web-scraping-for-beginners/data-extraction/node-continued$ /academy/scraping-basics-javascript/extracting-data?legacy-js-course=/data-extraction/node-continued permanent;
561+
rewrite ^/academy/web-scraping-for-beginners/data-extraction/node-js-scraper$ /academy/scraping-basics-javascript/downloading-html?legacy-js-course=/data-extraction/node-js-scraper permanent;
562+
rewrite ^/academy/web-scraping-for-beginners/data-extraction/project-setup$ /academy/scraping-basics-javascript/downloading-html?legacy-js-course=/data-extraction/project-setup permanent;
563+
rewrite ^/academy/web-scraping-for-beginners/data-extraction/save-to-csv$ /academy/scraping-basics-javascript/saving-data?legacy-js-course=/data-extraction/save-to-csv permanent;
564+
rewrite ^/academy/web-scraping-for-beginners/data-extraction/using-devtools$ /academy/scraping-basics-javascript/devtools-locating-elements?legacy-js-course=/data-extraction/using-devtools permanent;
565+
rewrite ^/academy/web-scraping-for-beginners/data-extraction$ /academy/scraping-basics-javascript/devtools-inspecting?legacy-js-course=/data-extraction permanent;
566+
rewrite ^/academy/web-scraping-for-beginners$ /academy/scraping-basics-javascript?legacy-js-course=/ permanent;
539567

540568
# Removed pages
541569
# GPT plugins were discontinued April 9th, 2024 - https://help.openai.com/en/articles/8988022-winding-down-the-chatgpt-plugins-beta

sources/academy/glossary/concepts/robot_process_automation.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Robotic process automation
2+
title: What is robotic process automation (RPA)
33
description: Learn the basics of robotic process automation. Make your processes on the web and other software more efficient by automating repetitive tasks.
44
sidebar_position: 8.7
55
slug: /concepts/robotic-process-automation
@@ -29,7 +29,7 @@ With the advance of [machine learning](https://en.wikipedia.org/wiki/Machine_lea
2929

3030
## Is RPA the same as web scraping? {#is-rpa-the-same-as-web-scraping}
3131

32-
While [web scraping](../../webscraping/scraping_basics_javascript/index.md) is a kind of RPA, it focuses on extracting structured data. RPA focuses on the other tasks in browsers - everything except for extracting information.
32+
While web scraping is a kind of RPA, it focuses on extracting structured data. RPA focuses on the other tasks in browsers - everything except for extracting information.
3333

3434
## Additional resources {#additional-resources}
3535

sources/academy/glossary/tools/apify_cli.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ The [Apify CLI](/cli) helps you create, develop, build and run Apify Actors, and
1313

1414
## Installing {#installing}
1515

16-
To install the Apify CLI, you'll first need npm, which comes preinstalled with Node.js. If you haven't yet installed Node, [learn how to do that](../../webscraping/scraping_basics_javascript/data_extraction/computer_preparation.md). Additionally, make sure you've got an Apify account, as you will need to log in to the CLI to gain access to its full potential.
16+
To install the Apify CLI, you'll first need npm, which comes preinstalled with Node.js. Additionally, make sure you've got an Apify account, as you will need to log in to the CLI to gain access to its full potential.
1717

1818
Open up a terminal instance and run the following command:
1919

sources/academy/homepage_content.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"Beginner courses": [
33
{
44
"title": "Web scraping basics with JS",
5-
"link": "/academy/web-scraping-for-beginners",
5+
"link": "/academy/scraping-basics-javascript",
66
"description": "Learn how to use JavaScript to extract information from websites in this practical course, starting from the absolute basics.",
77
"imageUrl": "/img/academy/scraping-basics-javascript.svg"
88
},

sources/academy/platform/expert_scraping_with_apify/actors_webhooks.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,26 @@
11
---
2-
title: I - Webhooks & advanced Actor overview
2+
title: Webhooks & advanced Actor overview
33
description: Learn more advanced details about Actors, how they work, and the default configurations they can take. Also, learn how to integrate your Actor with webhooks.
44
sidebar_position: 6.1
5+
sidebar_label: I - Webhooks & advanced Actor overview
56
slug: /expert-scraping-with-apify/actors-webhooks
67
---
78

89
**Learn more advanced details about Actors, how they work, and the default configurations they can take. Also, learn how to integrate your Actor with webhooks.**
910

11+
:::caution Updates coming
12+
13+
This lesson is subject to change because it currently relies on code from our archived **Web scraping basics for JavaScript devs** course. For now you can still access the archived course, but we plan to completely retire it in a few months. This lesson will be updated to remove the dependency.
14+
15+
:::
16+
1017
---
1118

1219
Thus far, you've run Actors on the platform and written an Actor of your own, which you published to the platform yourself using the Apify CLI; therefore, it's fair to say that you are becoming more familiar and comfortable with the concept of **Actors**. Within this lesson, we'll take a more in-depth look at Actors and what they can do.
1320

1421
## Advanced Actor overview {#advanced-actors}
1522

16-
In this course, we'll be working out of the Amazon scraper project from the **Web scraping basics for JavaScript devs** course. If you haven't already built that project, you can do it in [three short lessons](../../webscraping/scraping_basics_javascript/challenge/index.md). We've made a few small modifications to the project with the Apify SDK, but 99% of the code is still the same.
23+
In this course, we'll be working out of the Amazon scraper project from the **Web scraping basics for JavaScript devs** course. If you haven't already built that project, you can do it in [three short lessons](../../webscraping/scraping_basics_legacy/challenge/index.md). We've made a few small modifications to the project with the Apify SDK, but 99% of the code is still the same.
1724

1825
Take another look at the files within your Amazon scraper project. You'll notice that there is a **Dockerfile**. Every single Actor has a Dockerfile (the Actor's **Image**) which tells Docker how to spin up a container on the Apify platform which can successfully run the Actor's code. "Apify Actors" is a serverless platform that runs multiple Docker containers. For a deeper understanding of Actor Dockerfiles, refer to the [Apify Actor Dockerfile docs](/sdk/js/docs/guides/docker-images#example-dockerfile).
1926

@@ -39,7 +46,7 @@ Prior to moving forward, please read over these resources:
3946

4047
## Our task {#our-task}
4148

42-
In this task, we'll be building on top of what we already created in the [Web scraping basics for JavaScript devs](/academy/web-scraping-for-beginners/challenge) course's final challenge, so keep those files safe!
49+
In this task, we'll be building on top of what we already created in the [Web scraping basics for JavaScript devs](../../webscraping/scraping_basics_legacy/challenge/index.md) course's final challenge, so keep those files safe!
4350

4451
Once our Amazon Actor has completed its run, we will, rather than sending an email to ourselves, call an Actor through a webhook. The Actor called will be a new Actor that we will create together, which will take the dataset ID as input, then subsequently filter through all of the results and return only the cheapest one for each product. All of the results of the Actor will be pushed to its default dataset.
4552

sources/academy/platform/expert_scraping_with_apify/apify_api_and_client.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
---
2-
title: IV - Apify API & client
2+
title: Apify API & client
33
description: Gain an in-depth understanding of the two main ways of programmatically interacting with the Apify platform - through the API, and through a client.
44
sidebar_position: 6.4
5+
sidebar_label: IV - Apify API & client
56
slug: /expert-scraping-with-apify/apify-api-and-client
67
---
78

sources/academy/platform/expert_scraping_with_apify/bypassing_anti_scraping.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
---
2-
title: VI - Bypassing anti-scraping methods
2+
title: Bypassing anti-scraping methods
33
description: Learn about bypassing anti-scraping methods using proxies and proxy/session rotation together with Crawlee and the Apify SDK.
44
sidebar_position: 6.6
5+
sidebar_label: VI - Bypassing anti-scraping methods
56
slug: /expert-scraping-with-apify/bypassing-anti-scraping
67
---
78

sources/academy/platform/expert_scraping_with_apify/index.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,9 @@ Before developing a pro-level Apify scraper, there are some important things you
1818

1919
> If you've already gone through the [Web scraping basics for JavaScript devs](../../webscraping/scraping_basics_javascript/index.md) and the first courses of the [Apify platform category](../apify_platform.md), you will be more than well equipped to continue on with the lessons in this course.
2020
21-
<!-- ### Puppeteer/Playwright {#puppeteer-playwright}
22-
23-
[Puppeteer](https://pptr.dev/) is a library for running and controlling a [headless browser](../../webscraping/scraping_basics_javascript/crawling/headless_browser.md) in Node.js, and was developed at Google. The team working on it was hired by Microsoft to work on the [Playwright](https://playwright.dev/) project; therefore, many parallels can be seen between both the `puppeteer` and `playwright` packages. Proficiency in at least one of these will be good enough. -->
24-
2521
### Crawlee, Apify SDK, and the Apify CLI {#crawlee-apify-sdk-and-cli}
2622

27-
If you're feeling ambitious, you don't need to have any prior experience with Crawlee to get started with this course; however, at least 5–10 minutes of exposure is recommended. If you haven't yet tried out Crawlee, you can refer to [this lesson](../../webscraping/scraping_basics_javascript/crawling/pro_scraping.md) in the **Web scraping basics for JavaScript devs** course (and ideally follow along). To familiarize yourself with the Apify SDK, you can refer to the [Apify Platform](../apify_platform.md) category.
23+
If you're feeling ambitious, you don't need to have any prior experience with Crawlee to get started with this course; however, at least 5–10 minutes of exposure is recommended. If you haven't yet tried out Crawlee, you can refer to the [Using a scraping framework with Node.js](../../webscraping/scraping_basics_javascript/12_framework.md) lesson of the **Web scraping basics for JavaScript devs** course. To familiarize yourself with the Apify SDK, you can refer to the [Apify Platform](../apify_platform.md) category.
2824

2925
The Apify CLI will play a core role in the running and testing of the Actor you will build, so if you haven't gotten it installed already, please refer to [this short lesson](../../glossary/tools/apify_cli.md).
3026

sources/academy/platform/expert_scraping_with_apify/managing_source_code.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
---
2-
title: II - Managing source code
2+
title: Managing source code
33
description: Learn how to manage your Actor's source code more efficiently by integrating it with a GitHub repository. This is standard on the Apify platform.
44
sidebar_position: 6.2
5+
sidebar_label: II - Managing source code
56
slug: /expert-scraping-with-apify/managing-source-code
67
---
78

sources/academy/platform/expert_scraping_with_apify/migrations_maintaining_state.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
---
2-
title: V - Migrations & maintaining state
2+
title: Migrations & maintaining state
33
description: Learn about what Actor migrations are and how to handle them properly so that the state is not lost and runs can safely be resurrected.
44
sidebar_position: 6.5
5+
sidebar_label: V - Migrations & maintaining state
56
slug: /expert-scraping-with-apify/migrations-maintaining-state
67
---
78

0 commit comments

Comments
 (0)