Skip to content

Commit 536306c

Browse files
adds drop NA transformer closes #183 (#215)
* Intial Commit for NA transformer * Intial Commit for NA transformer * copy_and_safe removed_data * Minor changes -> method name and example * Comments and Examples * Comments and Examples * Comments and Examples * Comments - documentations * Minor errors * Documentation changes * test cases * Style changes * Style changes-2 * fix bug, reword docstrings, update notebook * change method name in docstrings Co-authored-by: Soledad Galli <[email protected]>
1 parent ebee162 commit 536306c

File tree

7 files changed

+1017
-1
lines changed

7 files changed

+1017
-1
lines changed
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
DropMissingData
2+
===============
3+
4+
API Reference
5+
-------------
6+
7+
.. autoclass:: feature_engine.imputation.DropMissingData
8+
:members:
9+
10+
Example
11+
-------
12+
13+
DropMissingData() deletes rows with NA values. It works with numerical and categorical
14+
variables. The user can pass a list of variables for which to delete rows with NA.
15+
Alternatively, DropMissingData() will default to all variables. The trasformer has the
16+
option to learn the variables with NA in the train set, and then remove observations
17+
with NA in only those variables.
18+
19+
.. code:: python
20+
21+
import numpy as np
22+
import pandas as pd
23+
from sklearn.model_selection import train_test_split
24+
25+
from feature_engine.imputation import DropMissingData
26+
27+
# Load dataset
28+
data = pd.read_csv('houseprice.csv')
29+
30+
# Separate into train and test sets
31+
X_train, X_test, y_train, y_test = train_test_split(
32+
data.drop(['Id', 'SalePrice'], axis=1),
33+
data['SalePrice'],
34+
test_size=0.3,
35+
random_state=0)
36+
37+
# set up the imputer
38+
missingdata_imputer = DropMissingData(variables=['LotFrontage', 'MasVnrArea'])
39+
40+
# fit the imputer
41+
missingdata_imputer.fit(X_train)
42+
43+
# transform the data
44+
train_t= missingdata_imputer.transform(X_train)
45+
test_t= missingdata_imputer.transform(X_test)
46+
47+
# Number of NA before the transformation:
48+
X_train['LotFrontage'].isna().sum()
49+
50+
.. code:: python
51+
52+
189
53+
54+
.. code:: python
55+
56+
# Number of NA after the transformation:
57+
train_t['LotFrontage'].isna().sum()
58+
59+
.. code:: python
60+
61+
0
62+
63+
.. code:: python
64+
65+
# Number of rows before and after transformation
66+
print(X_train.shape)
67+
print(train_t.shape)
68+
69+
.. code:: python
70+
71+
(1022, 79)
72+
(829, 79)
73+
74+
75+

docs/imputation/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,5 @@ from data or arbitrary values pre-defined by the user.
1414
EndTailImputer
1515
CategoricalImputer
1616
RandomSampleImputer
17-
AddMissingIndicator
17+
AddMissingIndicator
18+
DropMissingData

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,7 @@ Missing Data Imputation: Imputers
102102
- :doc:`imputation/CategoricalImputer`: replaces missing data in categorical variables with the string 'Missing' or by the most frequent category
103103
- :doc:`imputation/RandomSampleImputer`: replaces missing data with random samples of the variable
104104
- :doc:`imputation/AddMissingIndicator`: adds a binary missing indicator to flag observations with missing data
105+
- :doc:`imputation/DropMissingData`: removes rows containing NA values from dataframe
105106

106107
Categorical Variable Encoders: Encoders
107108
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)