|
| 1 | +__signature__ |
| 2 | +keras.datasets.california_housing.load_data( |
| 3 | + version='large', |
| 4 | + path='california_housing.npz', |
| 5 | + test_split=0.2, |
| 6 | + seed=113 |
| 7 | +) |
| 8 | +__doc__ |
| 9 | +Loads the California Housing dataset. |
| 10 | + |
| 11 | +This dataset was obtained from the [StatLib repository]( |
| 12 | +https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html). |
| 13 | + |
| 14 | +It's a continuous regression dataset with 20,640 samples with |
| 15 | +8 features each. |
| 16 | + |
| 17 | +The target variable is a scalar: the median house value |
| 18 | +for California districts, in dollars. |
| 19 | + |
| 20 | +The 8 input features are the following: |
| 21 | + |
| 22 | +- MedInc: median income in block group |
| 23 | +- HouseAge: median house age in block group |
| 24 | +- AveRooms: average number of rooms per household |
| 25 | +- AveBedrms: average number of bedrooms per household |
| 26 | +- Population: block group population |
| 27 | +- AveOccup: average number of household members |
| 28 | +- Latitude: block group latitude |
| 29 | +- Longitude: block group longitude |
| 30 | + |
| 31 | +This dataset was derived from the 1990 U.S. census, using one row |
| 32 | +per census block group. A block group is the smallest geographical |
| 33 | +unit for which the U.S. Census Bureau publishes sample data |
| 34 | +(a block group typically has a population of 600 to 3,000 people). |
| 35 | + |
| 36 | +A household is a group of people residing within a home. |
| 37 | +Since the average number of rooms and bedrooms in this dataset are |
| 38 | +provided per household, these columns may take surprisingly large |
| 39 | +values for block groups with few households and many empty houses, |
| 40 | +such as vacation resorts. |
| 41 | + |
| 42 | +Args: |
| 43 | + version: `"small"` or `"large"`. The small version |
| 44 | + contains 600 samples, the large version contains |
| 45 | + 20,640 samples. The purpose of the small version is |
| 46 | + to serve as an approximate replacement for the |
| 47 | + deprecated `boston_housing` dataset. |
| 48 | + path: path where to cache the dataset locally |
| 49 | + (relative to `~/.keras/datasets`). |
| 50 | + test_split: fraction of the data to reserve as test set. |
| 51 | + seed: Random seed for shuffling the data |
| 52 | + before computing the test split. |
| 53 | + |
| 54 | +Returns: |
| 55 | + Tuple of Numpy arrays: `(x_train, y_train), (x_test, y_test)`. |
| 56 | + |
| 57 | +**`x_train`, `x_test`**: numpy arrays with shape `(num_samples, 8)` |
| 58 | + containing either the training samples (for `x_train`), |
| 59 | + or test samples (for `y_train`). |
| 60 | + |
| 61 | +**`y_train`, `y_test`**: numpy arrays of shape `(num_samples,)` |
| 62 | + containing the target scalars. The targets are float scalars |
| 63 | + typically between 25,000 and 500,000 that represent |
| 64 | + the home prices in dollars. |
| 65 | + |
0 commit comments