Skip to content

Commit f305309

Browse files
Update exercise 2 in xarray lecture
1 parent 37f16c9 commit f305309

File tree

1 file changed

+41
-34
lines changed

1 file changed

+41
-34
lines changed

content/xarray.rst

Lines changed: 41 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -380,54 +380,61 @@ Exercises 2
380380

381381
.. challenge:: Exercises: Xarray-2
382382

383-
Let's change from climate science to finance for this example. Put the stock prices and trading volumes of three companies over ten days in one dataset. Create an Xarray Dataset that uses time and company as dimensions and contains two DataArrays: ``stock_price`` and ``trading_volume``. You can choose the values for the stock prices and trading volumes yourself. As a last thing, add the currency of the stock prices as an attribute to the Dataset.
383+
Let's change from climate science to finance for this example. Put the stock prices and trading volumes of three companies in one dataset. Create an Xarray Dataset that uses time and company as dimensions and contains two DataArrays: ``stock_price`` and ``trading_volume``. You can download the data as a pandas DataFrame with the following code: ::
384+
385+
import yfinance as yf
386+
387+
AAPL_df = yf.download("AAPL", start="2020-01-01", end="2024-01-01")
388+
GOOGL_df = yf.download("GOOGL", start="2020-01-01", end="2024-01-01")
389+
MSFT_df = yf.download("MSFT", start="2020-01-01", end="2024-01-01")
390+
391+
392+
As a last thing, add the currency of the stock prices as an attribute to the Dataset.
384393

385394
.. solution:: Solutions: Xarray-2
386395

387396
We can use a script similar to this one: ::
388397

389398
import xarray as xr
390399
import numpy as np
400+
import yfinance as yf
401+
402+
start_date = "2020-01-01"
403+
end_date = "2024-01-01"
404+
405+
AAPL_df = yf.download("AAPL", start=start_date, end=end_date)
406+
GOOGL_df = yf.download("GOOGL", start=start_date, end=end_date)
407+
MSFT_df = yf.download("MSFT", start=start_date, end=end_date)
408+
409+
410+
stock_prices = np.array(
411+
[
412+
AAPL_df["Close"].values,
413+
GOOGL_df["Close"].values,
414+
MSFT_df["Close"].values,
415+
]
416+
)
417+
418+
trading_volumes = np.array(
419+
[
420+
AAPL_df["Volume"].values,
421+
GOOGL_df["Volume"].values,
422+
MSFT_df["Volume"].values,
423+
]
424+
)
425+
391426

392-
time = [
393-
"2023-01-01",
394-
"2023-01-02",
395-
"2023-01-03",
396-
"2023-01-04",
397-
"2023-01-05",
398-
"2023-01-06",
399-
"2023-01-07",
400-
"2023-01-08",
401-
"2023-01-09",
402-
"2023-01-10",
403-
]
404427
companies = ["AAPL", "GOOGL", "MSFT"]
405-
stock_prices = np.random.normal(loc=[100, 1500, 200], scale=[10, 50, 20], size=(10, 3))
406-
trading_volumes = np.random.randint(1000, 10000, size=(10, 3))
428+
time = AAPL_df.index[:].strftime("%Y-%m-%d").tolist()
429+
407430
ds = xr.Dataset(
408-
data_vars = {
409-
"stock_price": (["time", "company"], stock_prices),
410-
"trading_volume": (["time", "company"], trading_volumes),
431+
{
432+
"stock_price": (["company", "time"], stock_prices[:, :, 0]),
433+
"trading_volume": (["company", "time"], trading_volumes[:, :, 0]),
411434
},
412435
coords={"time": time, "company": companies},
413436
attrs={"currency": "USD"},
414437
)
415-
print(ds)
416-
417-
The output should then resemble this: ::
418-
419-
> python exercise.py
420-
<xarray.Dataset> Size: 940B
421-
Dimensions: (time: 10, company: 3)
422-
Coordinates:
423-
* time (time) <U10 400B '2023-01-01' '2023-01-02' ... '2023-01-10'
424-
* company (company) <U5 60B 'AAPL' 'GOOGL' 'MSFT'
425-
Data variables:
426-
stock_price (time, company) float64 240B 101.1 1.572e+03 ... 217.8
427-
trading_volume (time, company) int64 240B 1214 7911 4578 ... 4338 6861 6958
428-
Attributes:
429-
currency: USD
430-
431438

432439

433440

0 commit comments

Comments
 (0)