scikit-learn
diff --git a/‎dev/_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip
24 Bytes b/‎dev/_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip
24 Bytes
diff --git a/‎dev/_downloads/6953689dfdc5dd401dda89604bbdaefb/plot_time_series_lagged_features.ipynb
Lines changed: 20 additions & 2 deletions b/‎dev/_downloads/6953689dfdc5dd401dda89604bbdaefb/plot_time_series_lagged_features.ipynb
Lines changed: 20 additions & 2 deletions
diff --git a/‎dev/_downloads/6f1e7a639e0699d6164445b55e6c116d/auto_examples_jupyter.zip
727 Bytes b/‎dev/_downloads/6f1e7a639e0699d6164445b55e6c116d/auto_examples_jupyter.zip
727 Bytes
@@ -11,7 +11,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "## Analyzing the Bike Sharing Demand dataset\n\nWe start by loading the data from the OpenML repository\nas a pandas dataframe. This will be replaced with Polars\nonce `fetch_openml` adds a native support for it.\nWe convert to Polars for feature engineering, as it automatically caches\ncommon subexpressions which are reused in multiple expressions\n(like `pl.col(\"count\").shift(1)` below). See\nhttps://docs.pola.rs/user-guide/lazy/optimizations/ for more information.\n\n"
+        "## Analyzing the Bike Sharing Demand dataset\n\nWe start by loading the data from the OpenML repository as a raw parquet file\nto illustrate how to work with an arbitrary parquet file instead of hiding this\nstep in a convenience tool such as `sklearn.datasets.fetch_openml`.\n\nThe URL of the parquet file can be found in the JSON description of the\nBike Sharing Demand dataset with id 44063 on openml.org\n(https://openml.org/search?type=data&status=active&id=44063).\n\nThe `sha256` hash of the file is also provided to ensure the integrity of the\ndownloaded file.\n\n"
       ]
     },
     {
@@ -22,7 +22,25 @@
       },
       "outputs": [],
       "source": [
-        "import numpy as np\nimport polars as pl\n\nfrom sklearn.datasets import fetch_openml\n\npl.Config.set_fmt_str_lengths(20)\n\nbike_sharing = fetch_openml(\n    \"Bike_Sharing_Demand\", version=2, as_frame=True, parser=\"pandas\"\n)\ndf = bike_sharing.frame\ndf = pl.DataFrame({col: df[col].to_numpy() for col in df.columns})"
+        "import numpy as np\nimport polars as pl\n\nfrom sklearn.datasets import fetch_file\n\npl.Config.set_fmt_str_lengths(20)\n\nbike_sharing_data_file = fetch_file(\n    \"https://openml1.win.tue.nl/datasets/0004/44063/dataset_44063.pq\",\n    sha256=\"d120af76829af0d256338dc6dd4be5df4fd1f35bf3a283cab66a51c1c6abd06a\",\n)\nbike_sharing_data_file"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "We load the parquet file with Polars for feature engineering. Polars\nautomatically caches common subexpressions which are reused in multiple\nexpressions (like `pl.col(\"count\").shift(1)` below). See\nhttps://docs.pola.rs/user-guide/lazy/optimizations/ for more information.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "df = pl.read_parquet(bike_sharing_data_file)"
       ]
     },
     {