Skip to content

Commit 069d356

Browse files
authored
style: let the reader decide if something is simple or easy (upstream branch) (#1201)
Closes #1071. It's the same PR, but from an upstream branch, not from my fork. As discussed on Slack, this works around unauthorized npm token on CI. I also rebased the original branch on top of current master, hopefully without mistakes. Otherwise the changes should be the same.
2 parents bbf2869 + f43a2cf commit 069d356

File tree

117 files changed

+349
-375
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

117 files changed

+349
-375
lines changed

sources/academy/glossary/concepts/css_selectors.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ CSS selectors are important for web scraping because they allow you to target sp
5959

6060
For example, if you wanted to scrape a list of all the titles of blog posts on a website, you could use a CSS selector to select all the elements that contain the title text. Once you have selected these elements, you can extract the text from them and use it for your scraping project.
6161

62-
Additionally, when web scraping it is important to understand the structure of the website and CSS selectors can help you to navigate it easily. With them, you can select specific elements and their children, siblings, or parent elements. This allows you to extract data that is nested within other elements, or to navigate through the page structure to find the data you need.
62+
Additionally, when web scraping it is important to understand the structure of the website and CSS selectors can help you to navigate it. With them, you can select specific elements and their children, siblings, or parent elements. This allows you to extract data that is nested within other elements, or to navigate through the page structure to find the data you need.
6363

6464
## Resources
6565

sources/academy/glossary/concepts/http_headers.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ For some websites, you won't need to worry about modifying headers at all, as th
2323

2424
Some websites will require certain default browser headers to work properly, such as **User-Agent** (though, this header is becoming more obsolete, as there are more sophisticated ways to detect and block a suspicious user).
2525

26-
Another example of such a "default" header is **Referer**. Some e-commerce websites might share the same platform, and data is loaded through XMLHttpRequests to that platform, which simply would not know which data to return without knowing which exact website is requesting it.
26+
Another example of such a "default" header is **Referer**. Some e-commerce websites might share the same platform, and data is loaded through XMLHttpRequests to that platform, which would not know which data to return without knowing which exact website is requesting it.
2727

2828
## Custom headers required {#needs-custom-headers}
2929

@@ -44,7 +44,7 @@ You could use Chrome DevTools to inspect request headers, and [Insomnia](../tool
4444
HTTP/1.1 and HTTP/2 headers have several differences. Here are the three key differences that you should be aware of:
4545

4646
1. HTTP/2 headers do not include status messages. They only contain status codes.
47-
2. Certain headers are no longer used in HTTP/2 (such as **Connection** along with a few others related to it like **Keep-Alive**). In HTTP/2, connection-specific headers are prohibited. While some browsers will simply ignore them, Safari and other Webkit-based browsers will outright reject any response that contains them. Easy to do by accident, and a big problem.
47+
2. Certain headers are no longer used in HTTP/2 (such as **Connection** along with a few others related to it like **Keep-Alive**). In HTTP/2, connection-specific headers are prohibited. While some browsers will ignore them, Safari and other Webkit-based browsers will outright reject any response that contains them. Easy to do by accident, and a big problem.
4848
3. While HTTP/1.1 headers are case-insensitive and could be sent by the browsers with capitalized letters (e.g. **Accept-Encoding**, **Cache-Control**, **User-Agent**), HTTP/2 headers must be lower-cased (e.g. **accept-encoding**, **cache-control**, **user-agent**).
4949

5050
> To learn more about the difference between HTTP/1.1 and HTTP/2 headers, check out [this](https://httptoolkit.tech/blog/translating-http-2-into-http-1/) article

sources/academy/glossary/tools/edit_this_cookie.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ slug: /tools/edit-this-cookie
1111

1212
---
1313

14-
**EditThisCookie** is a simple Chrome extension to manage your browser's cookies. It can be added through the [Chrome Web Store](https://chrome.google.com/webstore/category/extensions). After adding it to Chrome, you'll see a button with a delicious cookie icon next to any other Chrome extensions you might have installed. Clicking on it will open a pop-up window with a list of all saved cookies associated with the currently opened page domain.
14+
**EditThisCookie** is a Chrome extension to manage your browser's cookies. It can be added through the [Chrome Web Store](https://chrome.google.com/webstore/category/extensions). After adding it to Chrome, you'll see a button with a delicious cookie icon next to any other Chrome extensions you might have installed. Clicking on it will open a pop-up window with a list of all saved cookies associated with the currently opened page domain.
1515

1616
![EditThisCookie popup](./images/edit-this-cookie-popup.png)
1717

@@ -21,11 +21,11 @@ At the top of the popup, there is a row of buttons. From left to right, here is
2121

2222
### Delete all cookies
2323

24-
Clicking this button will simply remove all cookies associated with the current domain. For example, if you're logged into your Apify account and delete all the cookies, the website will ask you to log in again.
24+
Clicking this button will remove all cookies associated with the current domain. For example, if you're logged into your Apify account and delete all the cookies, the website will ask you to log in again.
2525

2626
### Reset
2727

28-
Basically just a refresh button.
28+
A refresh button.
2929

3030
### Add a new cookie
3131

sources/academy/glossary/tools/insomnia.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
---
22
title: Insomnia
3-
description: Learn about Insomnia, a simple yet super valuable tool for testing requests and proxies when building scalable web scrapers.
3+
description: Learn about Insomnia, a valuable tool for testing requests and proxies when building scalable web scrapers.
44
sidebar_position: 9.2
55
slug: /tools/insomnia
66
---
77

88
# What is Insomnia {#what-is-insomnia}
99

10-
**Learn about Insomnia, a simple yet super valuable tool for testing requests and proxies when building scalable web scrapers.**
10+
**Learn about Insomnia, a valuable tool for testing requests and proxies when building scalable web scrapers.**
1111

1212
---
1313

@@ -66,4 +66,4 @@ This will bring up the **Manage cookies** window, where all cached cookies can b
6666

6767
## Postman or Insomnia {#postman-or-insomnia}
6868

69-
The application you choose to use is completely up to your personal preference, and will not affect your development workflow. If viewing timelines of the requests you send is important to you, then you should go with Insomnia; however, if that doesn't matter, just choose the one that has the most intuitive interface for you.
69+
The application you choose to use is completely up to your personal preference, and will not affect your development workflow. If viewing timelines of the requests you send is important to you, then you should go with Insomnia; however, if that doesn't matter, choose the one that has the most intuitive interface for you.

sources/academy/glossary/tools/modheader.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@ If you read about [Postman](./postman.md), you might remember that you can use i
1919

2020
After you install the ModHeader extension, you should see it pinned in Chrome's task bar. When you click it, you'll see an interface like this pop up:
2121

22-
![Modheader's simple interface](./images/modheader.jpg)
22+
![Modheader's interface](./images/modheader.jpg)
2323

24-
Here, you can add headers, remove headers, and even save multiple collections of headers that you can easily toggle between (which are called **Profiles** within the extension itself).
24+
Here, you can add headers, remove headers, and even save multiple collections of headers that you can toggle between (which are called **Profiles** within the extension itself).
2525

2626
## Use cases {#use-cases}
2727

sources/academy/glossary/tools/postman.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
11
---
22
title: Postman
3-
description: Learn about Postman, a simple yet super valuable tool for testing requests and proxies when building scalable web scrapers.
3+
description: Learn about Postman, a valuable tool for testing requests and proxies when building scalable web scrapers.
44
sidebar_position: 9.3
55
slug: /tools/postman
66
---
77

88
# What is Postman? {#what-is-postman}
99

10-
**Learn about Postman, a simple yet super valuable tool for testing requests and proxies when building scalable web scrapers.**
10+
**Learn about Postman, a valuable tool for testing requests and proxies when building scalable web scrapers.**
1111

1212
---
1313

14-
[Postman](https://www.postman.com/) is a powerful collaboration platform for API development and testing. For scraping use-cases, it's mainly used to test requests and proxies (such as checking the response body of a raw request, without loading any additional resources such as JavaScript or CSS). This tool can do much more than that, but we will not be discussing all of its capabilities here. Postman allows us to easily test requests with cookies, headers, and payloads so that we can be entirely sure what the response looks like for a request URL we plan to eventually use in a scraper.
14+
[Postman](https://www.postman.com/) is a powerful collaboration platform for API development and testing. For scraping use-cases, it's mainly used to test requests and proxies (such as checking the response body of a raw request, without loading any additional resources such as JavaScript or CSS). This tool can do much more than that, but we will not be discussing all of its capabilities here. Postman allows us to test requests with cookies, headers, and payloads so that we can be entirely sure what the response looks like for a request URL we plan to eventually use in a scraper.
1515

16-
The desktop app can be downloaded from its [official download page](https://www.postman.com/downloads/), or the web app can be used with a simple signup - no download required. If this is your first time working with a tool like Postman, we recommend checking out their [Getting Started guide](https://learning.postman.com/docs/getting-started/introduction/).
16+
The desktop app can be downloaded from its [official download page](https://www.postman.com/downloads/), or the web app can be used with a signup - no download required. If this is your first time working with a tool like Postman, we recommend checking out their [Getting Started guide](https://learning.postman.com/docs/getting-started/introduction/).
1717

1818
## Understanding the interface {#understanding-the-interface}
1919

@@ -43,7 +43,7 @@ In order to use a proxy, the proxy's server and configuration must be provided i
4343

4444
![Proxy configuration in Postman settings](./images/postman-proxy.png)
4545

46-
After configuring a proxy, the next request sent will attempt to use it. To switch off the proxy, its details don't need to be deleted. The **Add a custom proxy configuration** option in settings just needs to be un-ticked to disable it.
46+
After configuring a proxy, the next request sent will attempt to use it. To switch off the proxy, its details don't need to be deleted. The **Add a custom proxy configuration** option in settings needs to be un-ticked to disable it.
4747

4848
## Managing the cookies cache {#managing-cookies}
4949

@@ -55,7 +55,7 @@ In order to check whether there are any cookies associated with a certain reques
5555

5656
![Button to view the cached cookies](./images/postman-cookies-button.png)
5757

58-
Clicking on this button opens a **MANAGE COOKIES** window, where a list of all cached cookies per domain can be seen. If we had been previously sending multiple requests to **https://github.com/apify**, within this window we would be able to easily find cached cookies associated with github.com. Cookies can also be easily edited (to update some specific values), or deleted (to send a "clean" request without any cached data) here.
58+
Clicking on this button opens a **MANAGE COOKIES** window, where a list of all cached cookies per domain can be seen. If we had been previously sending multiple requests to **https://github.com/apify**, within this window we would be able to find cached cookies associated with github.com. Cookies can also be edited (to update some specific values), or deleted (to send a "clean" request without any cached data) here.
5959

6060
![Managing cookies in Postman with the "MANAGE COOKIES" window](./images/postman-manage-cookies.png)
6161

sources/academy/glossary/tools/quick_javascript_switcher.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
11
---
22
title: Quick JavaScript Switcher
3-
description: Discover a super simple tool for disabling JavaScript on a certain page to determine how it should be scraped. Great for detecting SPAs.
3+
description: Discover a handy tool for disabling JavaScript on a certain page to determine how it should be scraped. Great for detecting SPAs.
44
sidebar_position: 9.9
55
slug: /tools/quick-javascript-switcher
66
---
77

88
# Quick JavaScript Switcher
99

10-
**Discover a super simple tool for disabling JavaScript on a certain page to determine how it should be scraped. Great for detecting SPAs.**
10+
**Discover a handy tool for disabling JavaScript on a certain page to determine how it should be scraped. Great for detecting SPAs.**
1111

1212
---
1313

14-
**Quick JavaScript Switcher** is a very simple Chrome extension that allows you to switch on/off the JavaScript for the current page with one click. It can be added to your browser via the [Chrome Web Store](https://chrome.google.com/webstore/category/extensions). After adding it to Chrome, you'll see its respective button next to any other Chrome extensions you might have installed.
14+
**Quick JavaScript Switcher** is a Chrome extension that allows you to switch on/off the JavaScript for the current page with one click. It can be added to your browser via the [Chrome Web Store](https://chrome.google.com/webstore/category/extensions). After adding it to Chrome, you'll see its respective button next to any other Chrome extensions you might have installed.
1515

1616
If JavaScript is enabled - clicking the button will switch it off and reload the page. The next click will re-enable JavaScript and refresh the page. This extension is useful for checking whether a certain website will work without JavaScript (and thus could be parsed without using a browser with a plain HTTP request) or not.
1717

sources/academy/glossary/tools/user_agent_switcher.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
11
---
22
title: User-Agent Switcher
3-
description: Learn how to easily switch your User-Agent header to different values in order to monitor how a certain site responds to the changes.
3+
description: Learn how to switch your User-Agent header to different values in order to monitor how a certain site responds to the changes.
44
sidebar_position: 9.8
55
slug: /tools/user-agent-switcher
66
---
77

88
# User-Agent Switcher
99

10-
**Learn how to easily switch your User-Agent header to different values in order to monitor how a certain site responds to the changes.**
10+
**Learn how to switch your User-Agent header to different values in order to monitor how a certain site responds to the changes.**
1111

1212
---
1313

14-
**User-Agent Switcher** is a simple Chrome extension that allows you to quickly change your **User-Agent** and see how a certain website would behave with different user agents. After adding it to Chrome, you'll see a **Chrome UA Spoofer** button in the extension icons area. Clicking on it will open up a list of various **User-Agent** groups.
14+
**User-Agent Switcher** is a Chrome extension that allows you to quickly change your **User-Agent** and see how a certain website would behave with different user agents. After adding it to Chrome, you'll see a **Chrome UA Spoofer** button in the extension icons area. Clicking on it will open up a list of various **User-Agent** groups.
1515

1616
![User-Agent Switcher groups](./images/user-agent-switcher-groups.png)
1717

sources/academy/platform/deploying_your_code/deploying.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,10 @@ Before we deploy our project onto the Apify platform, let's ensure that we've pu
2121
2222
### Creating the Actor
2323

24-
Before anything can be integrated, we've gotta create a new Actor. Luckily, this is super easy to do. Let's head over to our [Apify Console](https://console.apify.com?asrc=developers_portal) and click on the **Develop new** button, then select the **Empty** template.
24+
Before anything can be integrated, we've gotta create a new Actor. Let's head over to our [Apify Console](https://console.apify.com?asrc=developers_portal) and click on the **Develop new** button, then select the **Empty** template.
2525

2626
![Create new button](../getting_started/images/develop-new-actor.png)
2727

28-
Easy peasy!
29-
3028
### Changing source code location {#change-source-code}
3129

3230
In the **Source** tab on the new Actor's page, we'll click the dropdown menu under **Source code** and select **Git repository**. By default, this is set to **Web IDE**.

sources/academy/platform/deploying_your_code/docker_file.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,15 @@ import TabItem from '@theme/TabItem';
1616

1717
The **Dockerfile** is a file which gives the Apify platform (or Docker, more specifically) instructions on how to create an environment for your code to run in. Every Actor must have a Dockerfile, as Actors run in Docker containers.
1818

19-
> Actors on the platform are always run in Docker containers; however, they can also be run in local Docker containers. This is not common practice though, as it requires more setup and a deeper understanding of Docker. For testing, it's best to just run the Actor on the local OS (this requires you to have the underlying runtime installed, such as Node.js, Python, Rust, GO, etc).
19+
> Actors on the platform are always run in Docker containers; however, they can also be run in local Docker containers. This is not common practice though, as it requires more setup and a deeper understanding of Docker. For testing, it's best to run the Actor on the local OS (this requires you to have the underlying runtime installed, such as Node.js, Python, Rust, GO, etc).
2020
2121
## Base images {#base-images}
2222

2323
If your project doesn’t already contain a Dockerfile, don’t worry! Apify offers [many base images](/sdk/js/docs/guides/docker-images) that are optimized for building and running Actors on the platform, which can be found [here](https://hub.docker.com/u/apify). When using a language for which Apify doesn't provide a base image, [Docker Hub](https://hub.docker.com/) provides a ton of free Docker images for most use-cases, upon which you can create your own images.
2424

2525
> Tip: You can see all of Apify's Docker images [on DockerHub](https://hub.docker.com/r/apify/).
2626
27-
At the base level, each Docker image contains a base operating system and usually also a programming language runtime (such as Node.js or Python). You can also find images with preinstalled libraries or just install them yourself during the build step.
27+
At the base level, each Docker image contains a base operating system and usually also a programming language runtime (such as Node.js or Python). You can also find images with preinstalled libraries or install them yourself during the build step.
2828

2929
Once you find the base image you need, you can add it as the initial `FROM` statement:
3030

@@ -111,7 +111,7 @@ CMD python3 main.py
111111

112112
## Examples {#examples}
113113

114-
The examples we just showed were for Node.js and Python, however, to drive home the fact that Actors can be written in any language, here are some examples of some Dockerfiles for Actors written in different programming languages:
114+
The examples above show how to deploy Actors written in Node.js or Python, but you can use any language. As an inspiration, here are a few examples for other languages: Go, Rust, Julia.
115115

116116
<Tabs groupId="main">
117117
<TabItem value="GO Actor Dockerfile" label="GO Actor Dockerfile">

0 commit comments

Comments
 (0)