Skip to content

Commit c5a6dc0

Browse files
committed
style: improve English
1 parent f5e25c6 commit c5a6dc0

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

sources/academy/webscraping/scraping_basics_python/12_framework.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ We genuinely believe beginners to scraping will like it more, since it allows to
3232

3333
## Installing Crawlee
3434

35-
When starting with the Crawlee framework, you first need to decide which approach to downloading and parsing you'll prefer. We want the one based on BeautifulSoup, hence we'll install the `crawlee` package with the `beautifulsoup` extra specified in brackets. The framework has a lot of dependencies of its own, so expect the installation to take a while.
35+
When starting with the Crawlee framework, we first need to decide which approach to downloading and parsing we prefer. We want the one based on BeautifulSoup, so let's install the `crawlee` package with the `beautifulsoup` extra specified in brackets. The framework has a lot of dependencies, so expect the installation to take a while.
3636

3737
```text
3838
$ pip install crawlee[beautifulsoup]
@@ -42,7 +42,7 @@ Successfully installed Jinja2-0.0.0 ... ... ... crawlee-0.0.0 ... ... ...
4242

4343
## Running Crawlee
4444

45-
Now let's use the framework to create a new version of our scraper. In the same project directory where our `main.py` file lives, create a file `newmain.py`. This way we can keep peeking at the original implementation when we're working on the new one. The initial content will look like this:
45+
Now let's use the framework to create a new version of our scraper. In the same project directory where our `main.py` file lives, create a file `newmain.py`. This way, we can keep peeking at the original implementation while working on the new one. The initial content will look like this:
4646

4747
```py title="newmain.py"
4848
import asyncio
@@ -61,15 +61,15 @@ if __name__ == '__main__':
6161
asyncio.run(main())
6262
```
6363

64-
In the code we do the following:
64+
In the code, we do the following:
6565

6666
1. We perform imports and specify an asynchronous `main()` function.
6767
1. Inside, we first create a crawler. The crawler objects control the scraping. This particular crawler is of the BeautifulSoup flavor.
68-
1. In the middle, we give the crawler a nested asynchronous function `handle_listing()`. Using a Python decorator (that line starting with `@`) we tell it to treat it as a default handler. Handlers take care of processing HTTP responses. This one finds the title of the page in `soup` and prints its text without whitespace.
69-
1. The function ends with running the crawler with the products listing URL. We await until the crawler does its work.
70-
1. The last two lines ensure that if we run the file as a standalone program, Python's asynchronous machinery `asyncio` will run our `main()` function.
68+
1. In the middle, we give the crawler a nested asynchronous function `handle_listing()`. Using a Python decorator (that line starting with `@`), we tell it to treat it as a default handler. Handlers take care of processing HTTP responses. This one finds the title of the page in `soup` and prints its text without whitespace.
69+
1. The function ends with running the crawler with the product listing URL. We await the crawler to finish its work.
70+
1. The last two lines ensure that if we run the file as a standalone program, Python's asynchronous machinery will run our `main()` function.
7171

72-
Don't worry if it's a lot of things you've never seen before. For now it's not really important to know exactly how [asyncio](https://docs.python.org/3/library/asyncio.html) works, or what decorators do. Let's stick to the practical side and see what the program does if executed:
72+
Don't worry if this involves a lot of things you've never seen before. For now, you don't need to know exactly how [`asyncio`](https://docs.python.org/3/library/asyncio.html) works or what decorators do. Let's stick to the practical side and see what the program does when executed:
7373

7474
```text
7575
$ python newmain.py
@@ -104,7 +104,7 @@ Sales
104104
└───────────────────────────────┴──────────┘
105105
```
106106

107-
If our previous program didn't give us any sense of progress, Crawlee feeds us with perhaps too much information for our purposes. Between all the diagnostics, notice the line `Sales`. That's the page title! We managed to create a Crawlee scraper which downloads the product listing page, parses it with BeautifulSoup, extracts the title, and prints it.
107+
If our previous scraper didn't give us any sense of progress, Crawlee feeds us with perhaps too much information for the purposes of a small program. Among all the diagnostics, notice the line `Sales`. That's the page title! We managed to create a Crawlee scraper that downloads the product listing page, parses it with BeautifulSoup, extracts the title, and prints it.
108108

109109
## Crawling product detail pages
110110

0 commit comments

Comments
 (0)