You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sources/academy/platform/deploying_your_code/inputs_outputs.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -90,7 +90,7 @@ Cool! When we run `node index.js`, we see **20**.
90
90
91
91
Alternatively, when writing in a language other than JavaScript, we can create our own `get_input()` function which utilizes the Apify API when the actor is running on the platform. For this example, we are using the [Apify Client](../getting_started/apify_client.md) for Python to access the API.
92
92
93
-
```Python
93
+
```py
94
94
# index.py
95
95
from apify_client import ApifyClient
96
96
from os import environ
@@ -164,7 +164,7 @@ Just as with the custom `get_input()` utility function, you can write a custom `
164
164
165
165
> You can read and write your output anywhere; however, it is standard practice to use a folder named **storage**.
Copy file name to clipboardExpand all lines: sources/academy/tutorials/python/process_data_using_python.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,7 +31,7 @@ In the page that opens, you can see your newly created actor. In the **Settings*
31
31
32
32
First, we'll start with the `requirements.txt` file. Its purpose is to list all the third-party packages that your actor will use. We will be using the `pandas` package for parsing the downloaded weather data, and the `matplotlib` package for visualizing it. We don't particularly care about the specific versions of these packages, so we just list them in the file:
33
33
34
-
```python
34
+
```py
35
35
# Add your dependencies here.
36
36
# See https://pip.pypa.io/en/latest/cli/pip_install/#requirements-file-format
37
37
# for how to format them
@@ -44,7 +44,7 @@ The actor's main logic will live in the `main.py` file. Let's delete everything
44
44
45
45
Next, we'll import all the packages we will use in the code:
46
46
47
-
```python
47
+
```py
48
48
from io import BytesIO
49
49
import os
50
50
@@ -59,7 +59,7 @@ Next, we need to run the weather scraping actor and access its results. We do th
59
59
60
60
First, we initialize an `ApifyClient` instance. All the necessary arguments are automatically provided to the actor process as environment variables accessible in Python through the `os.environ` mapping. We need to run the actor from the previous tutorial, which we have named `bbc-weather-scraper`, and wait for it to finish. So, we create a sub-client for working with that actor and run the actor through it. We then check whether the actor run has succeeded. If so, we create a client for working with its default dataset.
Now, we need to load the data from the dataset to a Pandas dataframe. Pandas supports reading data from a CSV file stream, so we just create a stream with the dataset items in the right format and supply it to `pandas.read_csv()`.
@@ -88,7 +88,7 @@ weather_data = pandas.read_csv(dataset_items_stream, parse_dates=['datetime'], d
88
88
89
89
Once we have the data loaded, we can process it. Each data row comes as three fields: `datetime`, `location` and `temperature`. We would like to transform the data so that we have the datetimes in one column, and the temperatures for each location at that datetime in separate columns, one for each location. To achieve this, we use the `.pivot()` method on the dataframe. Since the temperature varies considerably between day and night, and we would like to get an overview of the temperature trends over a longer period of time, we calculate a rolling average of the temperatures with a 24-hour window.
90
90
91
-
```python
91
+
```py
92
92
# Transform data to a pivot table for easier plotting
With the data processed, we can then make a plot of the results. For that, we use the `.plot()` method of the dataframe, which creates a figure with the plot, using the Matplotlib library internally. We set the right titles and labels to the plot, and apply some additional formatting to achieve a nicer result.
As the last step, we need to save the plot to a record in a [key-value store](/platform/storage/key-value-store) on the Apify platform, so that we can access it later. We save the rendered figure with the plot to an in-memory buffer, and then save the contents of that buffer to the default key-value store of the actor run through its resource subclient.
114
114
115
-
```python
115
+
```py
116
116
# Get the resource sub-client for working with the default key-value store of the run
Copy file name to clipboardExpand all lines: sources/academy/tutorials/python/scrape_data_python.md
+14-14Lines changed: 14 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -63,7 +63,7 @@ In the page that opens, you can see your newly created actor. In the **Settings*
63
63
64
64
First we'll start with the `requirements.txt` file. Its purpose is to list all the third-party packages that your actor will use. We will be using the `requests` package for downloading the BBC Weather pages, and the `beautifulsoup4` package for parsing and processing the downloaded pages. We don't particularly care about the specific versions of these packages, so we just list them in the file:
65
65
66
-
```python
66
+
```py
67
67
# Add your dependencies here.
68
68
# See https://pip.pypa.io/en/latest/cli/pip_install/#requirements-file-format
69
69
# for how to format them
@@ -78,7 +78,7 @@ Finally, we can get to writing the main logic for the actor, which will live in
78
78
79
79
First, we need to import all the packages we will use in the code:
80
80
81
-
```python
81
+
```py
82
82
from datetime import datetime, time, timedelta, timezone
83
83
import os
84
84
import re
@@ -90,7 +90,7 @@ import requests
90
90
91
91
Next, let's set up the locations we want to scrape in a constant for easier reference and, optionally, modification.
92
92
93
-
```python
93
+
```py
94
94
# Locations which to scrape and their BBC Weather IDs
95
95
LOCATIONS= [
96
96
('Prague', '3067696'),
@@ -103,7 +103,7 @@ LOCATIONS = [
103
103
104
104
We'll be scraping each location separately. For each location, we need to know in which timezone it resides and what is the first displayed date in the weather forecast for that location. We will scrape each of the 14 forecast days one by one. For each day, we will first download its forecast page using the `requests` library, and then parse the downloaded HTML using the `BeautifulSoup` parser:
105
105
106
-
```python
106
+
```py
107
107
# List with scraped results
108
108
weather_data = []
109
109
@@ -126,7 +126,7 @@ First, we extract the timezone from the second element with class `wr-c-footer-t
126
126
127
127
Afterwards, we can figure out which date is represented by the first displayed day. We find the element with the class `wr-day--active` containing the header for the currently displayed day. Inside it, we find the element with the title of that day, which has the class `wr-day__title`. This element has the accessibility label containing the actual date of the day in its `aria-label` attribute, but it contains only the day and month and not the year, so we can't use it directly. Instead, to get the full date of the first displayed day, we compare the day from the accessibility label and the day from the current datetime at the location. If they match, we know the first displayed date is the current date at the location. If they don't, we know the first displayed date is the day before the current date at the location.
128
128
129
-
```python
129
+
```py
130
130
# When parsing the first day, find out what day it represents,
131
131
# to know when do the results start
132
132
if day_offset ==0:
@@ -162,7 +162,7 @@ To get the datetime of each slot, we need to combine the date of the first displ
162
162
163
163
Finally, we can put all the extracted information together and push them to the array holding the resulting data.
164
164
165
-
```python
165
+
```py
166
166
# Go through the elements for each displayed time slot of the displayed day
for slot in slot_container.find_all(class_='wr-time-slot'):
@@ -192,7 +192,7 @@ As the last step, we need to store the scraped data in a dataset on the Apify pl
192
192
193
193
First, we initialize an `ApifyClient` instance. All the necessary arguments are automatically provided to the actor process as environment variables accessible in Python through the `os.environ` mapping. We will save the data into the default dataset belonging to the actor run, so we create a sub-client for working with that dataset, and push the data into it using its `.push_items(...)` method.
@@ -231,7 +231,7 @@ In the page that opens, you can see your newly created actor. In the **Settings*
231
231
232
232
First, we'll start with the `requirements.txt` file. Its purpose is to list all the third-party packages that your actor will use. We will be using the `pandas` package for parsing the downloaded weather data, and the `matplotlib` package for visualizing it. We don't particularly care about the specific versions of these packages, so we just list them in the file:
233
233
234
-
```python
234
+
```py
235
235
# Add your dependencies here.
236
236
# See https://pip.pypa.io/en/latest/cli/pip_install/#requirements-file-format
237
237
# for how to format them
@@ -244,7 +244,7 @@ The actor's main logic will live in the `main.py` file. Let's delete everything
244
244
245
245
Next, we'll import all the packages we will use in the code:
246
246
247
-
```python
247
+
```py
248
248
from io import BytesIO
249
249
import os
250
250
@@ -259,7 +259,7 @@ Next, we need to run the weather scraping actor and access its results. We do th
259
259
260
260
First, we initialize an `ApifyClient` instance. All the necessary arguments are automatically provided to the actor process as environment variables accessible in Python through the `os.environ` mapping. We need to run the actor from the previous tutorial, which we have named `bbc-weather-scraper`, and wait for it to finish. So, we create a sub-client for working with that actor and run the actor through it. We then check whether the actor run has succeeded. If so, we create a client for working with its default dataset.
Now, we need to load the data from the dataset to a Pandas dataframe. Pandas supports reading data from a CSV file stream, so we just create a stream with the dataset items in the right format and supply it to `pandas.read_csv()`.
@@ -288,7 +288,7 @@ weather_data = pandas.read_csv(dataset_items_stream, parse_dates=['datetime'], d
288
288
289
289
Once we have the data loaded, we can process it. Each data row comes as three fields: `datetime`, `location` and `temperature`. We would like to transform the data so that we have the datetimes in one column, and the temperatures for each location at that datetime in separate columns, one for each location. To achieve this, we use the `.pivot()` method on the dataframe. Since the temperature varies considerably between day and night, and we would like to get an overview of the temperature trends over a longer period of time, we calculate a rolling average of the temperatures with a 24-hour window.
290
290
291
-
```python
291
+
```py
292
292
# Transform data to a pivot table for easier plotting
With the data processed, we can then make a plot of the results. For that, we use the `.plot()` method of the dataframe, which creates a figure with the plot, using the Matplotlib library internally. We set the right titles and labels to the plot, and apply some additional formatting to achieve a nicer result.
As the last step, we need to save the plot to a record in a [key-value store](/platform/storage/key-value-store) on the Apify platform, so that we can access it later. We save the rendered figure with the plot to an in-memory buffer, and then save the contents of that buffer to the default key-value store of the actor run through its resource subclient.
314
314
315
-
```python
315
+
```py
316
316
# Get the resource sub-client for working with the default key-value store of the run
0 commit comments