Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 41 additions & 34 deletions content/xarray.rst
Original file line number Diff line number Diff line change
Expand Up @@ -380,54 +380,61 @@ Exercises 2

.. challenge:: Exercises: Xarray-2

Let's change from climate science to finance for this example. Put the stock prices and trading volumes of three companies over ten days in one dataset. Create an Xarray Dataset that uses time and company as dimensions and contains two DataArrays: ``stock_price`` and ``trading_volume``. You can choose the values for the stock prices and trading volumes yourself. As a last thing, add the currency of the stock prices as an attribute to the Dataset.
Let's change from climate science to finance for this example. Put the stock prices and trading volumes of three companies in one dataset. Create an Xarray Dataset that uses time and company as dimensions and contains two DataArrays: ``stock_price`` and ``trading_volume``. You can download the data as a pandas DataFrame with the following code: ::

import yfinance as yf

AAPL_df = yf.download("AAPL", start="2020-01-01", end="2024-01-01")
GOOGL_df = yf.download("GOOGL", start="2020-01-01", end="2024-01-01")
MSFT_df = yf.download("MSFT", start="2020-01-01", end="2024-01-01")


As a last thing, add the currency of the stock prices as an attribute to the Dataset.

.. solution:: Solutions: Xarray-2

We can use a script similar to this one: ::

import xarray as xr
import numpy as np
import yfinance as yf

start_date = "2020-01-01"
end_date = "2024-01-01"

AAPL_df = yf.download("AAPL", start=start_date, end=end_date)
GOOGL_df = yf.download("GOOGL", start=start_date, end=end_date)
MSFT_df = yf.download("MSFT", start=start_date, end=end_date)


stock_prices = np.array(
[
AAPL_df["Close"].values,
GOOGL_df["Close"].values,
MSFT_df["Close"].values,
]
)

trading_volumes = np.array(
[
AAPL_df["Volume"].values,
GOOGL_df["Volume"].values,
MSFT_df["Volume"].values,
]
)


time = [
"2023-01-01",
"2023-01-02",
"2023-01-03",
"2023-01-04",
"2023-01-05",
"2023-01-06",
"2023-01-07",
"2023-01-08",
"2023-01-09",
"2023-01-10",
]
companies = ["AAPL", "GOOGL", "MSFT"]
stock_prices = np.random.normal(loc=[100, 1500, 200], scale=[10, 50, 20], size=(10, 3))
trading_volumes = np.random.randint(1000, 10000, size=(10, 3))
time = AAPL_df.index[:].strftime("%Y-%m-%d").tolist()

ds = xr.Dataset(
data_vars = {
"stock_price": (["time", "company"], stock_prices),
"trading_volume": (["time", "company"], trading_volumes),
{
"stock_price": (["company", "time"], stock_prices[:, :, 0]),
"trading_volume": (["company", "time"], trading_volumes[:, :, 0]),
},
coords={"time": time, "company": companies},
attrs={"currency": "USD"},
)
print(ds)

The output should then resemble this: ::

> python exercise.py
<xarray.Dataset> Size: 940B
Dimensions: (time: 10, company: 3)
Coordinates:
* time (time) <U10 400B '2023-01-01' '2023-01-02' ... '2023-01-10'
* company (company) <U5 60B 'AAPL' 'GOOGL' 'MSFT'
Data variables:
stock_price (time, company) float64 240B 101.1 1.572e+03 ... 217.8
trading_volume (time, company) int64 240B 1214 7911 4578 ... 4338 6861 6958
Attributes:
currency: USD




Expand Down
1 change: 1 addition & 0 deletions software/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ dependencies:
- vega_datasets
- xarray
- netcdf4
- yfinance
- pip
- pip:
- pythia_datasets
Loading