Skip to content

Commit d26e568

Browse files
dodoargsolegalli
andauthored
adds Datetime transformer, closes #67 (#333)
* Added datetime variable converter/finder. Categorical variable finder doesn't return date time variables disguised as obj/cat anymore * added draft of datetime transformer base class and date feature extractor * +q,s,y from ExtractDateFeatures; takes in argument to select which features (defaults to year atm) * added week of the year. Added tests to check on various raised exceptions * Docs improved. Added all option to ExtractDateFeatures. Added dotw, dotm, is_weekend, wotm * added option to drop datetime features, defaulting to True * Overhauled dt (private) conversion methods to accomodate for different datetime formats, added tests accordingly * adjusted date transformer to accomodate for pd.to_datetime kwargs * ran tox -e lint * refactored features_to_extract argparse; typechecked, stylechecked. * added tests for features_to_extract argparser * minor sanity checks for private conversion methods in the presence of nans * fixed unintended behaviour when _find_or_check_datetime_variables was passed a str as kwarg * replaced dtype checking with hasattr dt when converting to datetime to accomodate for pandas duck type DatetimeTZDtype * minor fix * Added datetime variable converter/finder. Categorical variable finder doesn't return date time variables disguised as obj/cat anymore * added draft of datetime transformer base class and date feature extractor * +q,s,y from ExtractDateFeatures; takes in argument to select which features (defaults to year atm) * added week of the year. Added tests to check on various raised exceptions * Docs improved. Added all option to ExtractDateFeatures. Added dotw, dotm, is_weekend, wotm * added option to drop datetime features, defaulting to True * Overhauled dt (private) conversion methods to accomodate for different datetime formats, added tests accordingly * adjusted date transformer to accomodate for pd.to_datetime kwargs * ran tox -e lint * refactored features_to_extract argparse; typechecked, stylechecked. * added tests for features_to_extract argparser * minor sanity checks for private conversion methods in the presence of nans * fixed unintended behaviour when _find_or_check_datetime_variables was passed a str as kwarg * replaced dtype checking with hasattr dt when converting to datetime to accomodate for pandas duck type DatetimeTZDtype * minor fix * moved base dt class to dt module; removed <raises> entries from new functions doc * removed _convert* methods, simplified behaviour of _find_or_check_datetime_variables; reverted behaviour of _find_or_check_categorical_variables to not checking if variables are datetime * small fix to _find_or_check_datetime_variables + style stuff * removed irrelevant files * features_to_extract parameter checks brought back into ExtractDateFeatures.__init__ * removed irrelevant file * added option to raise/ignore nans; fixed major issue in the error raising tests; removed kwargs traces * transformer defaults to extracting all features; features_to_extract is list-enforced; features are now added in a more reasonable order e.g. var1_y, var1_m, var2_y, etc. * added day_of_the_year option to ExtractDateFeatures; renamed/polished pytest fixture for datetime-related tests * removed unnecessary changes to tests outside the scope of this PR * added dayfirst, yearfirst options to transformer; tests/docs adjusted accordingly * minor rearrangement of extracted features; removed is prefix to weekend feature * added several extracted features; tests adjusted accordingly * replaced dt.isocalendar().day with dt.day_of_week, which causes mappings (1,7) -> (0,6) * replaced dt.day_of_week with dt.dayofweek due to incompatibility with py36 * added time-only series to pytest fixture for upcoming time feature extraction testing * ExtractDateFeatures is now ExtractDatetimeFeatures; base datetime transformer class has been removed * added hour, minute, second extraction to ExtractDatetimeTransformer * features_to_extract now default to a reasonable subset of time/date features * refactored feature extraction in nested loop; introduced the datetime_constants glossary * temp * minor compatibility fix * refactored feature extraction in nested loop; introduced datetime_constants glossary * updates readme * renames class and module * modifies datetime var check logic * modifies logic datetime transformer * fixes cat var passed as str not getting detected as datetime * adds functionality to handle empty variable lists * fixes code style in datetime * fixes param check features_to_extract * fixes changes to datetime transformer logic, fixes tests accordingly * fixes small issue with ALL option, adds test * redesigns datetime testing; lints * cleans up after merge * adds test for categorical var * properly deals with different timezones * adds time_aware option to transformer * add parametrize error input param test * changes time_aware to boolean * changes assert is false in defo attributes test * add parametrize to param variable test * reorganises fit error tests * reorganises fit attr test, adds option to features_to_extract * reorganises transform error tests * renamed time_aware to utc for similarity with pd.to_datetime * adds test for localized tz * fixes code style * modifies typehint, but still raises error * make start to documentation * fixes mypy invocation error * makes all boolean features int * returns ValueError when utc is erroneously set to False * adds basic doc to user guide, fixes docs tests * reorganises order of modules * renames heading of encoding module * updates api docs and docstrings * reorganises user guide * improves error message when utc is erroneously set to False; tests it * adds examples in user guide * adds import line to user guide examples * rewords user guide, adds more examples * adds missing param to docstrings * links jupyter notebook * removes day_of_the_month from supported features * fixes typechecks fail cause by recent mypy update Co-authored-by: Soledad Galli <[email protected]>
1 parent db83d76 commit d26e568

File tree

20 files changed

+1688
-33
lines changed

20 files changed

+1688
-33
lines changed

README.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -54,14 +54,15 @@ transforming parameters from the data and then transform it.
5454
## Current Feature-engine's transformers include functionality for:
5555

5656
* Missing Data Imputation
57-
* Categorical Variable Encoding
58-
* Outlier Capping or Removal
57+
* Categorical Encoding
5958
* Discretisation
60-
* Numerical Variable Transformation
59+
* Outlier Capping or Removal
60+
* Variable Transformation
6161
* Variable Creation
6262
* Variable Selection
63-
* Scikit-learn Wrappers
63+
* Datetime Feature Extraction
6464
* Preprocessing
65+
* Scikit-learn Wrappers
6566

6667
### Imputation Methods
6768
* MeanMedianImputer
@@ -82,17 +83,17 @@ transforming parameters from the data and then transform it.
8283
* RareLabelEncoder
8384
* DecisionTreeEncoder
8485

85-
### Outlier Handling methods
86-
* Winsorizer
87-
* ArbitraryOutlierCapper
88-
* OutlierTrimmer
89-
9086
### Discretisation methods
9187
* EqualFrequencyDiscretiser
9288
* EqualWidthDiscretiser
9389
* DecisionTreeDiscretiser
9490
* ArbitraryDiscreriser
9591

92+
### Outlier Handling methods
93+
* Winsorizer
94+
* ArbitraryOutlierCapper
95+
* OutlierTrimmer
96+
9697
### Variable Transformation methods
9798
* LogTransformer
9899
* LogCpTransformer
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
DatetimeFeatures
2+
================
3+
4+
.. autoclass:: feature_engine.datetime.DatetimeFeatures
5+
:members:
6+

docs/api_doc/datetime/index.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
.. -*- mode: rst -*-
2+
3+
Datetime Features
4+
=================
5+
6+
Feature-engine's datetime transformers are able to extract a wide variety of datetime
7+
features from existing datetime or object-like data.
8+
9+
.. toctree::
10+
:maxdepth: 2
11+
12+
DatetimeFeatures
13+

docs/api_doc/encoding/index.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. -*- mode: rst -*-
22
3-
Categorical Variable Encoding
4-
=============================
3+
Categorical Encoding
4+
====================
55

66
Feature-engine's categorical encoders replace variable strings by estimated or
77
arbitrary numbers.

docs/api_doc/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,11 @@ Full API documentation for Feature-engine transformers.
1010

1111
imputation/index
1212
encoding/index
13-
transformation/index
1413
discretisation/index
1514
outliers/index
15+
transformation/index
1616
creation/index
1717
selection/index
18+
datetime/index
1819
preprocessing/index
1920
wrappers/index

docs/index.rst

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,13 @@ transform the data.
2222
Feature-engine includes transformers for:
2323

2424
- Missing data imputation
25-
- Categorical variable encoding
25+
- Categorical encoding
2626
- Discretisation
27-
- Variable transformation
2827
- Outlier capping or removal
28+
- Variable transformation
2929
- Variable combination
3030
- Variable selection
31+
- Datetime features
3132
- Preprocessing
3233

3334
Feature-engine allows you to select the variables you want to transform **within** each
@@ -114,8 +115,8 @@ Missing Data Imputation: Imputers
114115
- :doc:`api_doc/imputation/AddMissingIndicator`: adds a binary missing indicator to flag observations with missing data
115116
- :doc:`api_doc/imputation/DropMissingData`: removes observations (rows) containing missing values from dataframe
116117

117-
Categorical Variable Encoders: Encoders
118-
---------------------------------------
118+
Categorical Encoders: Encoders
119+
------------------------------
119120

120121
- :doc:`api_doc/encoding/OneHotEncoder`: performs one hot encoding, optional: of popular categories
121122
- :doc:`api_doc/encoding/CountFrequencyEncoder`: replaces categories by the observation count or percentage
@@ -126,16 +127,6 @@ Categorical Variable Encoders: Encoders
126127
- :doc:`api_doc/encoding/DecisionTreeEncoder`: replaces categories by predictions of a decision tree
127128
- :doc:`api_doc/encoding/RareLabelEncoder`: groups infrequent categories
128129

129-
Numerical Variable Transformation: Transformers
130-
-----------------------------------------------
131-
132-
- :doc:`api_doc/transformation/LogTransformer`: performs logarithmic transformation of numerical variables
133-
- :doc:`api_doc/transformation/LogCpTransformer`: performs logarithmic transformation after adding a constant value
134-
- :doc:`api_doc/transformation/ReciprocalTransformer`: performs reciprocal transformation of numerical variables
135-
- :doc:`api_doc/transformation/PowerTransformer`: performs power transformation of numerical variables
136-
- :doc:`api_doc/transformation/BoxCoxTransformer`: performs Box-Cox transformation of numerical variables
137-
- :doc:`api_doc/transformation/YeoJohnsonTransformer`: performs Yeo-Johnson transformation of numerical variables
138-
139130
Variable Discretisation: Discretisers
140131
-------------------------------------
141132

@@ -151,6 +142,16 @@ Outlier Capping or Removal
151142
- :doc:`api_doc/outliers/Winsorizer`: caps maximum or minimum values using statistical parameters
152143
- :doc:`api_doc/outliers/OutlierTrimmer`: removes outliers from the dataset
153144

145+
Numerical Transformation: Transformers
146+
--------------------------------------
147+
148+
- :doc:`api_doc/transformation/LogTransformer`: performs logarithmic transformation of numerical variables
149+
- :doc:`api_doc/transformation/LogCpTransformer`: performs logarithmic transformation after adding a constant value
150+
- :doc:`api_doc/transformation/ReciprocalTransformer`: performs reciprocal transformation of numerical variables
151+
- :doc:`api_doc/transformation/PowerTransformer`: performs power transformation of numerical variables
152+
- :doc:`api_doc/transformation/BoxCoxTransformer`: performs Box-Cox transformation of numerical variables
153+
- :doc:`api_doc/transformation/YeoJohnsonTransformer`: performs Yeo-Johnson transformation of numerical variables
154+
154155
Mathematical Combination:
155156
-------------------------
156157

@@ -173,6 +174,9 @@ Feature Selection:
173174
- :doc:`api_doc/selection/RecursiveFeatureElimination`: selects features recursively, by evaluating model performance
174175
- :doc:`api_doc/selection/RecursiveFeatureAddition`: selects features recursively, by evaluating model performance
175176

177+
Datetime:
178+
---------
179+
- :doc:`api_doc/datetime/DatetimeFeatures`: extract features from datetime variables
176180

177181
Preprocessing:
178182
--------------

0 commit comments

Comments
 (0)