@@ -449,6 +449,76 @@ you want to know why we prefer tox, this
449449will tell you everything ;)
450450
451451
452+ Code Profiling
453+ --------------
454+
455+ If you want to profile your code, you can use the **profiling ** module in root directory. There you will find two files,
456+ `profiling.py ` and `profiling.sh `. Both file does the same thing but in different ways. The profiling.py file is a python script
457+ containing a function that must be used as a decorator for the class/method we want to profile.
458+ The profiling.sh file is a bash/zsh script that you can run from the command line to profile whole .py script file.
459+ Let us see how to use them. First, start with profiling.py file.
460+
461+ I doubt that `DropDuplicateFeatures ` class should take more time than other classes as it iterates over the columns and
462+ checks if they are duplicated or not. So, I will profile the `DropDuplicateFeatures ` class.
463+
464+ First, I will find where this class resides and on top of the imports I will add the following line::
465+
466+ from profiling.profiling import profile_function
467+
468+ Now, I will decorate the `DropDuplicateFeatures.fit ` method with the `profile_function ` function::
469+
470+ @profile_function(output_file="profile.html")
471+ def fit(self, X: pd.DataFrame, y: pd.Series = None):
472+ ...
473+
474+ The next step is to create a temporary .py file that will contain the code that we want to profile.
475+
476+ For example, I will create a file named `temp.py ` and copy the following code::
477+
478+ import pandas as pd
479+ import numpy as np
480+
481+ from feature_engine.selection import DropDuplicateFeatures
482+
483+
484+ if __name__ == "__main__":
485+ rows = 10000
486+ cols = 60000
487+ col_names = [f"col_{i}" for i in range(cols)]
488+ df = pd.DataFrame(np.random.randint(0, 100, size=(rows, cols)), columns=col_names)
489+
490+ transformer = DropDuplicateFeatures()
491+ transformer.fit(df)
492+
493+ train_t = transformer.transform(df)
494+
495+
496+ Now, I will run the `temp.py ` file from the command line::
497+
498+ $ python temp.py
499+
500+ This will create a file named `profile.html ` in the root directory of the project. This file contains the profiling
501+ results. You can open it with your favorite browser and inspect the results.
502+
503+ If you don't like adding additional imports and decorator, then you can use the `profiling.sh ` file. This file is a bash/zsh
504+ script that you can run from the command line. Let us see how to use it.
505+
506+ Again, I will profile the `DropDuplicateFeatures ` class. I need to create a temporary .py file and put the same code as above.
507+ After that, open the terminal in root directory and run the following command::
508+
509+ $ ./profiling/profiling.sh temp.py
510+
511+
512+ This will create a directory, named `profiles `, in the root directory of the project. This directory contains tw files:
513+ the first is .html file and you can open it with any browser, the second file is .json file and you can use
514+ `speedscope <https://www.speedscope.app/ >`_ to visualize results.
515+
516+
517+ .. note ::
518+ To profile the memory usage, you can use the `memray ` package. You can find more information about it
519+ `here <https://bloomberg.github.io/memray/index.html >`_.
520+
521+
452522Review Process
453523--------------
454524
0 commit comments