Skip to content

Commit c5368be

Browse files
authored
add roadmap to docs (#289)
* add roadmap initial draft * modifies governance * adds about and authords * update roadmap
1 parent 38fa0b8 commit c5368be

9 files changed

+183
-11
lines changed

docs/about.rst

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,43 @@
33
About
44
=====
55

6-
Coming Soon!
6+
History
7+
-------
8+
9+
Data scientists spend a huge amount of time on data pre-processing and transformation.
10+
It would be great (we thought back in the day) to gather the most frequently used data
11+
pre-processing techniques and transformations in a library, from which we could pick
12+
and choose the transformation that we need, and use it just like we would use any other
13+
sklearn class. This was the original vision for Feature-engine.
14+
15+
Feature-engine is an open source Python package originally designed to support the online
16+
course `Feature Engineering for Machine Learning in Udemy <https://www.udemy.com/feature-engineering-for-machine-learning/?couponCode=FEATENGREPO>`_,
17+
but has now gained popularity and supports transformations beyond those taught in the
18+
course. It was launched in 2017, and since then, several releases have appeared and a
19+
growing international community is beginning to lead the development.
20+
21+
Governance
22+
----------
23+
24+
The decision making process and governance structure of Feature-engine is laid out in
25+
the `governance document <https://feature-engine.readthedocs.io/en/latest/governance.html>`_.
26+
27+
Core contributors
28+
-----------------
29+
30+
The following people are currently core contributors to Feature-engine’s development
31+
and maintenance:
32+
33+
.. include:: authors.rst
34+
35+
Contributors
36+
------------
37+
38+
You can learn more about Feature-engine's Contributors in the
39+
`GitHub contributors page <https://github.com/solegalli/feature_engine/graphs/contributors>`_.
40+
41+
Citing Feature-engine
42+
---------------------
43+
44+
Coming soon
45+

docs/authors.rst

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
.. raw :: html
2+
3+
<!-- Generated by generate_authors_table.py -->
4+
<div class="sk-authors-container">
5+
<style>
6+
img.avatar {border-radius: 5px;}
7+
</style>
8+
<div>
9+
<a href='https://github.com/solegalli'><img src='https://avatars.githubusercontent.com/solegalli?v=4' class='avatar' width="120"
10+
height="120" /></a> <br />
11+
<p>Soledad Galli</p>
12+
</div>
13+
<div>
14+
<a href='https://github.com/christophergs'><img src='https://avatars.githubusercontent.com/christophergs?v=4' width="120"
15+
height="120"class='avatar' /></a> <br />
16+
<p>Chris Samiullah</p>
17+
</div>
18+
<div>
19+
<a href='https://github.com/nicogalli'><img src='https://avatars.githubusercontent.com/nicogalli?v=4' class='avatar'width="120"
20+
height="120"/></a> <br />
21+
<p>Nicolas Galli</p>
22+
</div>

docs/governance.rst

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -23,12 +23,9 @@ Core Contributors
2323

2424
Core Contributors are community members who are dedicated to the continued development
2525
of the project through ongoing engagement with the community. Core Contributors are
26-
expected to review code contributions, can aprove and merge pull requests, can decide
26+
expected to review code contributions, can approve and merge pull requests, can decide
2727
on the fate of pull requests, and can be involved in deciding major changes to the
28-
Feature-engine API. Core Contributors together with the Founder (see below) and input
29-
from the community can decide on the fate of the Feature-engine project.
30-
31-
Core Contributors determine who can join as a Core Contributor.
28+
Feature-engine API. Core Contributors determine who can join as a Core Contributor.
3229

3330

3431
Founder and Leadership
@@ -49,10 +46,11 @@ vote for new Core Contributors.
4946
Join the community
5047
------------------
5148

52-
Feature-engine welcomes contributors who would like to take on the role of additional
53-
Core Contributors and Contributors.
49+
Feature-engine is currently looking to expand the team of Core Contributors, if you are
50+
interested, please get in touch.
5451

55-
Get in touch using our Github issues page or through our mailing list:
52+
If you want to Contribute to the project in any other way, get in touch using our Github
53+
issues page or through Gitter:
5654

5755
1. `Github issues <https://github.com/solegalli/feature_engine/issues/>`_.
58-
2. `Mailing list <https://groups.google.com/d/forum/feature-engine>`_.
56+
2. `Gitter community <https://gitter.im/feature_engine/community>`_.
41.5 KB
Loading
98.8 KB
Loading
26.5 KB
Loading
24.1 KB
Loading

docs/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,6 @@ The `issues <https://github.com/solegalli/feature_engine/issues/>`_ and
224224
quickstart
225225
installation
226226
getting_help
227-
about
228227
datasets
229228

230229
.. toctree::
@@ -254,6 +253,8 @@ The `issues <https://github.com/solegalli/feature_engine/issues/>`_ and
254253
:maxdepth: 1
255254
:caption: Contribute
256255

256+
roadmap
257+
about
257258
contribute/index
258259
code_of_conduct
259260
governance

docs/roadmap.rst

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
Roadmap
2+
=======
3+
4+
This document provides general directions on what the core contributors would like to
5+
see developed in Feature-engine. As resources are limited, we can't promise when or if
6+
the transformers listed here will be included in the library. We welcome all the help
7+
we can get to support this vision. If you are interested in contributing, please get in
8+
touch.
9+
10+
Purpose
11+
-------
12+
13+
Feature-engine's mission is to simplify and streamline the implementation of end-to-end
14+
feature engineering pipelines. It aims to help users both during the research phase and
15+
while putting a model in production.
16+
17+
Feature-engine makes data engineering easy by allowing the selection of feature subsets
18+
directly within its transformers. It also interlaces well with exploratory data analysis
19+
(EDA) by returning dataframes for easy data exploration.
20+
21+
Feature-engine’s transformers preserve Scikit-learn functionality with the methods fit()
22+
and transform() and can be integrated into a Pipeline to simplify putting the model in
23+
production.
24+
25+
Feature-engine was designed to be used in real settings. Each transformer has a concrete
26+
aim, and is tailored to certain variables and certain data. Transformers raise errors
27+
and warnings to support the user to use a suitable transformation given the data.
28+
These errors help avoid inadvertedly incorporating missing values to the dataframe at
29+
unwanted stages of the development.
30+
31+
32+
Vision
33+
------
34+
35+
At the moment, Feature-engine's functionality is tailored to cross-sectional or tabular
36+
data, mostly numerical or categorical. But we would like to extend its functionality
37+
to work with datetime, text and time series. In the following figure we show how we
38+
would like the overall structure of Feature-engine to look like:
39+
40+
.. figure:: images/FeatureEnginePackageStructure.png
41+
:align: center
42+
43+
Feature-engine structure
44+
45+
Current functionality
46+
---------------------
47+
48+
Most of the functionality for cross-sectional data is already included in the package.
49+
We expand and update this arm of the library, based on user feedback and suggestions
50+
and our own research in the field. In grey, the transformers that are not yet included
51+
in the package:
52+
53+
.. figure:: images/FeatureEnginePackageStructureCrossSectional.png
54+
:align: center
55+
56+
Transformers for cross-sectional data
57+
58+
The current transformations supported by Feature-engine return features that are easy
59+
to interpret, and the effects of the transformations are clear and easy to understand.
60+
The original aim of Feature-engine was to provide technology that is suitable to create
61+
models that will be used in real settings, and return understandable variables.
62+
63+
Having said this, more and more, users are requesting features to combine or transform
64+
variables in ways that would return features that are not human readable, in an attempt
65+
to improve model performance and perhaps have an edge in data science competitions. We
66+
are currently contemplating the incorporation of this functionality to the package.
67+
68+
Wanted functionality
69+
--------------------
70+
71+
We are interested in adding a module that creates date and time related features from
72+
datetime variables. This module would include transformers to extract all possible date
73+
and time related features, like hr, min, sec, day, year, is_weekend, etc. And it would
74+
also include transformers to capture elapsed time between 2 or more variables.
75+
76+
We would also like to add a module that returns straightforward features from simple
77+
text variables, to capture text complexity, like for example counting the number
78+
of words, unique words, lexical complexity, number of paragraphs and sentences. We would
79+
also consider integrating the Bag of Words and TFiDF from sklearn with a wrapper that
80+
returns a dataframe ready to use to train machine learning models. Below we show more
81+
detail into these new modules.
82+
83+
.. figure:: images/FeatureEnginePackageStructureDatetimeText.png
84+
:align: center
85+
86+
New models wanted: datetime and text
87+
88+
In addition, we are evaluating whether including a module to extract features from time
89+
series is possible, within the current design of the package, and if it adds real value
90+
compared to the functionality already existing in pandas and Scipy, and in other well
91+
established open source projects like tsfresh and featuretools. The transformations
92+
we are considering are shown in this image:
93+
94+
.. figure:: images/FeatureEnginePackageStructureTimeseries.png
95+
:align: center
96+
97+
Time series module and the transformations envisioned
98+
99+
100+
Goals
101+
-----
102+
103+
Our main goals are:
104+
105+
- Continue maintaining a high-quality, well-documented collection of canonical tools for data processing
106+
- Expand the documentation with more examples about Feature-engine's functionality
107+
- Expand the documentation with more detail on how to contribute to the package
108+
- Expand the library's functionality as per the precedent paragraphs
109+
110+
For more fine-grained goals and current and lined-up issues please visit the `issue <https://github.com/solegalli/feature_engine/issues/>`_
111+
section in our repo.
112+

0 commit comments

Comments
 (0)