|
| 1 | +Roadmap |
| 2 | +======= |
| 3 | + |
| 4 | +This document provides general directions on what the core contributors would like to |
| 5 | +see developed in Feature-engine. As resources are limited, we can't promise when or if |
| 6 | +the transformers listed here will be included in the library. We welcome all the help |
| 7 | +we can get to support this vision. If you are interested in contributing, please get in |
| 8 | +touch. |
| 9 | + |
| 10 | +Purpose |
| 11 | +------- |
| 12 | + |
| 13 | +Feature-engine's mission is to simplify and streamline the implementation of end-to-end |
| 14 | +feature engineering pipelines. It aims to help users both during the research phase and |
| 15 | +while putting a model in production. |
| 16 | + |
| 17 | +Feature-engine makes data engineering easy by allowing the selection of feature subsets |
| 18 | +directly within its transformers. It also interlaces well with exploratory data analysis |
| 19 | +(EDA) by returning dataframes for easy data exploration. |
| 20 | + |
| 21 | +Feature-engine’s transformers preserve Scikit-learn functionality with the methods fit() |
| 22 | +and transform() and can be integrated into a Pipeline to simplify putting the model in |
| 23 | +production. |
| 24 | + |
| 25 | +Feature-engine was designed to be used in real settings. Each transformer has a concrete |
| 26 | +aim, and is tailored to certain variables and certain data. Transformers raise errors |
| 27 | +and warnings to support the user to use a suitable transformation given the data. |
| 28 | +These errors help avoid inadvertedly incorporating missing values to the dataframe at |
| 29 | +unwanted stages of the development. |
| 30 | + |
| 31 | + |
| 32 | +Vision |
| 33 | +------ |
| 34 | + |
| 35 | +At the moment, Feature-engine's functionality is tailored to cross-sectional or tabular |
| 36 | +data, mostly numerical or categorical. But we would like to extend its functionality |
| 37 | +to work with datetime, text and time series. In the following figure we show how we |
| 38 | +would like the overall structure of Feature-engine to look like: |
| 39 | + |
| 40 | +.. figure:: images/FeatureEnginePackageStructure.png |
| 41 | + :align: center |
| 42 | + |
| 43 | + Feature-engine structure |
| 44 | + |
| 45 | +Current functionality |
| 46 | +--------------------- |
| 47 | + |
| 48 | +Most of the functionality for cross-sectional data is already included in the package. |
| 49 | +We expand and update this arm of the library, based on user feedback and suggestions |
| 50 | +and our own research in the field. In grey, the transformers that are not yet included |
| 51 | +in the package: |
| 52 | + |
| 53 | +.. figure:: images/FeatureEnginePackageStructureCrossSectional.png |
| 54 | + :align: center |
| 55 | + |
| 56 | + Transformers for cross-sectional data |
| 57 | + |
| 58 | +The current transformations supported by Feature-engine return features that are easy |
| 59 | +to interpret, and the effects of the transformations are clear and easy to understand. |
| 60 | +The original aim of Feature-engine was to provide technology that is suitable to create |
| 61 | +models that will be used in real settings, and return understandable variables. |
| 62 | + |
| 63 | +Having said this, more and more, users are requesting features to combine or transform |
| 64 | +variables in ways that would return features that are not human readable, in an attempt |
| 65 | +to improve model performance and perhaps have an edge in data science competitions. We |
| 66 | +are currently contemplating the incorporation of this functionality to the package. |
| 67 | + |
| 68 | +Wanted functionality |
| 69 | +-------------------- |
| 70 | + |
| 71 | +We are interested in adding a module that creates date and time related features from |
| 72 | +datetime variables. This module would include transformers to extract all possible date |
| 73 | +and time related features, like hr, min, sec, day, year, is_weekend, etc. And it would |
| 74 | +also include transformers to capture elapsed time between 2 or more variables. |
| 75 | + |
| 76 | +We would also like to add a module that returns straightforward features from simple |
| 77 | +text variables, to capture text complexity, like for example counting the number |
| 78 | +of words, unique words, lexical complexity, number of paragraphs and sentences. We would |
| 79 | +also consider integrating the Bag of Words and TFiDF from sklearn with a wrapper that |
| 80 | +returns a dataframe ready to use to train machine learning models. Below we show more |
| 81 | +detail into these new modules. |
| 82 | + |
| 83 | +.. figure:: images/FeatureEnginePackageStructureDatetimeText.png |
| 84 | + :align: center |
| 85 | + |
| 86 | + New models wanted: datetime and text |
| 87 | + |
| 88 | +In addition, we are evaluating whether including a module to extract features from time |
| 89 | +series is possible, within the current design of the package, and if it adds real value |
| 90 | +compared to the functionality already existing in pandas and Scipy, and in other well |
| 91 | +established open source projects like tsfresh and featuretools. The transformations |
| 92 | +we are considering are shown in this image: |
| 93 | + |
| 94 | +.. figure:: images/FeatureEnginePackageStructureTimeseries.png |
| 95 | + :align: center |
| 96 | + |
| 97 | + Time series module and the transformations envisioned |
| 98 | + |
| 99 | + |
| 100 | +Goals |
| 101 | +----- |
| 102 | + |
| 103 | +Our main goals are: |
| 104 | + |
| 105 | +- Continue maintaining a high-quality, well-documented collection of canonical tools for data processing |
| 106 | +- Expand the documentation with more examples about Feature-engine's functionality |
| 107 | +- Expand the documentation with more detail on how to contribute to the package |
| 108 | +- Expand the library's functionality as per the precedent paragraphs |
| 109 | + |
| 110 | +For more fine-grained goals and current and lined-up issues please visit the `issue <https://github.com/solegalli/feature_engine/issues/>`_ |
| 111 | +section in our repo. |
| 112 | + |
0 commit comments