Skip to content

Commit baadb04

Browse files
committed
Pushing the docs to dev/ for branch: main, commit 2b2e2903e5635dd93a741c955a87260fb69cfc3d
1 parent 21165dc commit baadb04

File tree

1,258 files changed

+6652
-6532
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,258 files changed

+6652
-6532
lines changed
Binary file not shown.

dev/_downloads/6953689dfdc5dd401dda89604bbdaefb/plot_time_series_lagged_features.ipynb

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
"cell_type": "markdown",
1212
"metadata": {},
1313
"source": [
14-
"## Analyzing the Bike Sharing Demand dataset\n\nWe start by loading the data from the OpenML repository\nas a pandas dataframe. This will be replaced with Polars\nonce `fetch_openml` adds a native support for it.\nWe convert to Polars for feature engineering, as it automatically caches\ncommon subexpressions which are reused in multiple expressions\n(like `pl.col(\"count\").shift(1)` below). See\nhttps://docs.pola.rs/user-guide/lazy/optimizations/ for more information.\n\n"
14+
"## Analyzing the Bike Sharing Demand dataset\n\nWe start by loading the data from the OpenML repository as a raw parquet file\nto illustrate how to work with an arbitrary parquet file instead of hiding this\nstep in a convenience tool such as `sklearn.datasets.fetch_openml`.\n\nThe URL of the parquet file can be found in the JSON description of the\nBike Sharing Demand dataset with id 44063 on openml.org\n(https://openml.org/search?type=data&status=active&id=44063).\n\nThe `sha256` hash of the file is also provided to ensure the integrity of the\ndownloaded file.\n\n"
1515
]
1616
},
1717
{
@@ -22,7 +22,25 @@
2222
},
2323
"outputs": [],
2424
"source": [
25-
"import numpy as np\nimport polars as pl\n\nfrom sklearn.datasets import fetch_openml\n\npl.Config.set_fmt_str_lengths(20)\n\nbike_sharing = fetch_openml(\n \"Bike_Sharing_Demand\", version=2, as_frame=True, parser=\"pandas\"\n)\ndf = bike_sharing.frame\ndf = pl.DataFrame({col: df[col].to_numpy() for col in df.columns})"
25+
"import numpy as np\nimport polars as pl\n\nfrom sklearn.datasets import fetch_file\n\npl.Config.set_fmt_str_lengths(20)\n\nbike_sharing_data_file = fetch_file(\n \"https://openml1.win.tue.nl/datasets/0004/44063/dataset_44063.pq\",\n sha256=\"d120af76829af0d256338dc6dd4be5df4fd1f35bf3a283cab66a51c1c6abd06a\",\n)\nbike_sharing_data_file"
26+
]
27+
},
28+
{
29+
"cell_type": "markdown",
30+
"metadata": {},
31+
"source": [
32+
"We load the parquet file with Polars for feature engineering. Polars\nautomatically caches common subexpressions which are reused in multiple\nexpressions (like `pl.col(\"count\").shift(1)` below). See\nhttps://docs.pola.rs/user-guide/lazy/optimizations/ for more information.\n\n"
33+
]
34+
},
35+
{
36+
"cell_type": "code",
37+
"execution_count": null,
38+
"metadata": {
39+
"collapsed": false
40+
},
41+
"outputs": [],
42+
"source": [
43+
"df = pl.read_parquet(bike_sharing_data_file)"
2644
]
2745
},
2846
{
Binary file not shown.

0 commit comments

Comments
 (0)