diff --git a/.gitignore b/.gitignore index 22b5395..46a7cd2 100644 --- a/.gitignore +++ b/.gitignore @@ -1,5 +1,9 @@ .DS_Store +# as long as the pyforest extensions don't need to be build, we can ignore the JS build folders: +src/pyforest/node_modules +src/pyforest/yarn.lock + # Temporary and binary files *~ *.py[cod] @@ -53,3 +57,6 @@ MANIFEST .vscode/* examples/*.py anaconda_credentials.txt + +# User-specific folders +devstuff diff --git a/CHANGELOG.rst b/CHANGELOG.rst index 48861fb..aa19859 100644 --- a/CHANGELOG.rst +++ b/CHANGELOG.rst @@ -2,6 +2,31 @@ Changelog ========= +Version 1.1.2 +============= + +- https://github.com/8080labs/pyforest/pull/56 Add "random" package to lazy imports + +Version 1.1.1 +============= + +- https://github.com/8080labs/pyforest/pull/49 disable non-jupyter console from being polluted with lazy import print statements. + +Version 1.1.0 +============= + +- import statements are now automatically added to the first *CODE* cell when a package is used in the notebook + +Version 1.0.0 +============= + +- import statements are now automatically added to the first cell when a package is used in the notebook + +Version 0.1.4 +============= + +- Users can now add own lazy imports by writing their import statements to ~/.pyforest/user_imports.py. + Version 0.1 =========== diff --git a/MANIFEST.in b/MANIFEST.in new file mode 100644 index 0000000..4ba1f8a --- /dev/null +++ b/MANIFEST.in @@ -0,0 +1,2 @@ +include src/pyforest/static/* +include src/pyforest/*.json \ No newline at end of file diff --git a/README.md b/README.md index 0cf97bb..de6fae2 100644 --- a/README.md +++ b/README.md @@ -1,46 +1,68 @@ -# pyforest - lazy-import of all popular Python Data Science libraries. Stop writing the same imports over and over again. +# pyforest - feel the bliss of automated imports + +### From the makers of [bamboolib](https://bamboolib.com) + +Writing the same imports over and over again is below your capacity. Let pyforest do the job for you. + + +With pyforest you can use all your favorite Python libraries without importing them before. +If you use a package that is not imported yet, pyforest imports the package for you and adds the code to the first Jupyter cell. If you don't use a library, it won't be imported. -pyforest lazy-imports all popular Python Data Science libraries so that they are always there when you need them. If you don't use a library, it won't be imported. When you are done with your script, you can export the Python code for the import statements. - [Demo in Jupyter Notebook](#demo-in-jupyter-notebook) -- [Demo in Python Shell](#demo-in-python-shell) +- [Scenario](#scenario) - [Using pyforest](#using-pyforest) - [Installation](#installation) - [FAQs](#frequently-asked-questions) - [Contributing](#contributing) -- [Using pyforest as Package Developer](#using-pyforest-as-package-developer) - [About](#about) -- [Join our community and grow further](#join-our-community-and-grow-further) ## Demo in Jupyter Notebook -![demo](examples/assets/pyforest_demo_in_jupyter.gif) +![demo](examples/assets/pyforest_demo_in_jupyter_notebook.gif) + + +## Scenario + +You are a Data Scientist who works with Python. Every day you start multiple new Jupyter notebooks because you want to explore some data or validate a hypothesis. + +During your work, you use many different libraries like `pandas`, `matplotlib`, `seaborn`, `numpy` or `sklearn`. However, before you can start with the actual work, you always need to import your libraries. + + +There are several __problems__ with this. Admittedly, they are small but they add up over time. +- It is boring because the imports are mostly the same. This is below your capacity. +- Missing imports disrupt the natural flow of your work. +- Sometimes, you may even need to look up the exact import statements. For example, `import matplotlib.pyplot as plt` or `from sklearn.ensemble import GradientBoostingRegressor` +__What if you could just focus on using the libraries?__ -## Demo in Python Shell -![demo](examples/assets/pyforest_demo_in_python_shell.png) +pyforest offers the following __solution__: +- You can use all your libraries like you usually do. If a library is not imported yet, pyforest will import it and add the import statement to the first Jupyter cell. +- If a library is not used, it won't be imported. +- Your notebooks stay reproducible and sharable without you wasting a thought on imports. ## Using pyforest -pyforest lazy-imports all popular Python Data Science libraries with a single line of code: -```python -from pyforest import * -``` +After you [installed](#installation) pyforest and its Jupyter extension, you can __use your favorite Python Data Science commands like you normally would - just without writing imports__. -And if you use Jupyter or IPython, you can even skip this line because pyforest adds itself to the autostart. +For example, if you want to read a CSV with pandas: -When you are done with your script, you can export all import statements via: +```python +df = pd.read_csv("titanic.csv") +``` +pyforest will automatically import pandas for you and add the import statement to the first cell: ```python -active_imports() +import pandas as pd ``` -Which libraries are available? -- We aim to add all popular Python Data Science libraries which should account for >99% of your daily imports. For example, `pandas` as `pd`, `numpy` as `np`, `seaborn` as `sns`, `matplotlib.pyplot` as `plt`, or `OneHotEncoder` from `sklearn` and many more. In addition, there are also helper modules like `os`, `re`, `tqdm`, or `Path` from `pathlib`. -- You can see an overview of all available lazy imports if you type `lazy_imports()` in Python. -- If you are missing an import, you can add it to the [pyforest imports](src/pyforest/_imports.py). + +__Which libraries are available?__ +- We aim to add all popular Python Data Science libraries which should account for >99% of your daily imports. For example, we already added `pandas` as `pd`, `numpy` as `np`, `seaborn` as `sns`, `matplotlib.pyplot` as `plt`, or `OneHotEncoder` from `sklearn` and many more. In addition, there are also helper modules like `os`, `re`, `tqdm`, or `Path` from `pathlib`. +- You can see an overview of all currently available imports [here](src/pyforest/_imports.py) +- If you are missing an import, you can either __add the import to your user specific pyforest imports__ as described in the [FAQs](#frequently-asked-questions) or you can open a pull request for the official [pyforest imports](src/pyforest/_imports.py) > In order to gather all the most important names, we need your help. Please open a pull request and add the [imports](src/pyforest/_imports.py) that we are still missing. @@ -49,19 +71,22 @@ Which libraries are available? > You need Python 3.6 or above because we love f-strings. -From the terminal, enter: +From the terminal (or Anaconda prompt in Windows), enter: -`pip install pyforest` +```bash +pip install --upgrade pyforest +python -m pyforest install_extensions +``` -And you're ready to go. +Please make sure to restart any running Jupyter server so that the javascript extension can be loaded properly. -Please note, that this will also add pyforest to your IPython default startup settings. +Also, please note that this will add pyforest to your IPython default startup settings. If you do not want this, you can disable the auto_import as described in the [FAQs](#frequently-asked-questions) below. ## Frequently Asked Questions -- __"I need to always explicitly write down the libraries I used at the top of my scripts."__ - - Of course, you can export the import statements for all used libraries with `active_imports()`. +- __"How to add my own import statements without adding them to the package source code?"__ + - pyforest creates a file in your home directory at `~/.pyforest/user_imports.py` in which you can type any **explicit** import statements you want (e.g. `import pandas as pd`). Your own custom imports take precedence over any other pyforest imports. **Please note:** implicit imports (e.g. `from pandas import *`) won't work. - __"Doesn't this slow down my Jupyter or Python startup process?"__ - No, because the libraries will only be imported when you actually use them. Until you use them, the variables like `pd` are only pyforest placeholders. @@ -73,63 +98,29 @@ Please note, that this will also add pyforest to your IPython default startup se - Tensorflow is included in pyforest but pyforest does not install any dependencies. You need to install your libraries separately from pyforest. Afterwards, you can access the libraries via pyforest if they are included in the [pyforest imports](src/pyforest/_imports.py). - __"Will the pyforest variables interfere with my own local variables?"__ - - Please make sure that you import pyforest at the beginning of your script. Then you will always be safe. You can use your variables like you would without pyforest. The worst thing that can happen is that you overwrite a pyforest placeholder and thus cannot use the placeholder any more (duh). + - No, never. pyforest will never mask or overwrite any of your local variables. You can use your variables like you would without pyforest. The worst thing that can happen is that you overwrite a pyforest placeholder and thus cannot use the placeholder any more (duh). - __"What about auto-completion on lazily imported modules?"__ - It works :) As soon as you start the auto-completion, pyforest will import the module and return the available symbols to your auto-completer. - __"How to (temporarily) deactivate the auto_import in IPython and Jupyter?"__ - - Go to the directory `~/.ipython/profile_default/startup` and adjust or delete the `pyforest_autoimport.py` file. You will find further instructions in the file. + - Go to the directory `~/.ipython/profile_default/startup` and adjust or delete the `pyforest_autoimport.py` file. You will find further instructions in the file. If you don't use the auto_import, you will need to import pyforest at the beginning of your notebook via `import pyforest` - __"How to (re)activate the pyforest auto_import?"__ - Execute the following Python command in Jupyter, IPython or Python: `from pyforest.auto_import import setup; setup()`. Please note that the auto_import only works for Jupyter and IPython. -- __"Why is pandas_profiling also imported in the demo?"__ - - pyforest supports complementary, optional imports. For example, `pandas_profiling` patches the `pd.DataFrame` with the convenience function `df.profile_report`. Therefore, pyforest also imports `pandas_profiling` if you have it installed. If you don't have `pandas_profiling` installed, the optional import will be skipped. - -- __"I don't want to copy complementary import statements to the top of my file."__ - - Please note, that the complementary imports will always appear at the bottom of the import_statements list. So, you can just copy all statements above. Alternatively, you can deactivate complementary imports. - -- __"How to deactivate complementary imports?"__ - - You can uncomment the statements `*.__on_import__()` at the bottom of the [pyforest imports](src/pyforest/_imports.py) file. - -- __"How to add my own import statements without adding them to the package source code?"__ - - pyforest creates a file `~/.pyforest/user_imports.py`, in which you can type any **explicit** import statements you want. **Please note:** implicit imports (e.g. `from pandas import *`) won't work. Besides, you shouldn't write implicit imports anyway. Those are only for bad programmers like the authors of pyforest :) +- __"Can I use pyforest outside of the Jupyter Notebook or Lab?"__ + - Technically, yes. However, this is not the intended use case. pyforest is aimed primarily for the use in a Jupyter Notebook or Lab. If you want to use pyforest in IPython or a Python script etc, please import it as follows `import pyforest`. Afterwards, you can get the currently active imports via `pyforest.active_imports()` - __"Why is the project called pyforest?"__ - - In which ecosystem do pandas live? + - pyforest is created to be the home for all Data Science packages - including pandas. And in which ecosystems do pandas live? :) ## Contributing -In order to gather all the most important names, we need your help. Please open a pull request and add the imports that we are still missing to the [pyforest imports](src/pyforest/_imports.py). You can also find the guidelines in the [pyforest imports file](src/pyforest/_imports.py) - - -## Using pyforest as Package Developer -pyforest helps you to minimize the (initial) import time of your package which improves the user experience. If you want your package imports to become lazy, rewrite your imports as follows: - -Replace - -```python -import pandas as pd -``` - -with +If you'd like to contribute, a great place to look is the [issues marked with help-wanted](https://github.com/8080labs/pyforest/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22). -```python -from pyforest import LazyImport -pd = LazyImport("import pandas as pd") -``` +In order to gather all the most important names, we need your help. Please open a pull request and add the imports that we are still missing to the [pyforest imports](src/pyforest/_imports.py). You can also find the guidelines in the [pyforest imports file](src/pyforest/_imports.py) ## About -pyforest is developed by Florian, Tobias and Guido from [8080 Labs](https://8080labs.com). Our goal is to improve the productivity of Python Data Scientists. Other projects that we are working on are [edaviz](https://edaviz.com) and [bamboolib](https://bamboolib.com) - - -## Join our community and grow further -If you -- like our work or -- want to become a faster Python Data Scientist or -- want to discuss the future of the Python Data Science ecosystem or -- are just interested in mingling with like-minded fellows - -then, you are invited to [join our slack](https://join.slack.com/t/fasterpyds/shared_invite/enQtNzYxMTMzMDQ4MDk3LTYyNGRiNTE0OGJkNDEzZGRjNjg2Y2I0YWRlNTlmOGUxMjY5MDY5Yjg1MjliM2QwNmNhZmI3N2MxMmY3MGNiODA). +pyforest is developed by [8080 Labs](https://8080labs.com). Our goal is to make Python Data Scientists 10x faster. If you like the speedup to your workflow, you might also be interested in our other project [bamboolib](https://bamboolib.com) diff --git a/README_for_devs.md b/README_for_devs.md new file mode 100644 index 0000000..8d658fc --- /dev/null +++ b/README_for_devs.md @@ -0,0 +1,54 @@ + +## How to install the local python version during development +```bash +pip install -e . # alternatively, use pip3 +``` + +## How to install the extensions during development + +### JupyterLab development + +Via terminal, it's the same procedure as normal installation: + +```bash +python -m pyforest install_labextension +``` + +It is also possible via Python: +```python +import pyforest +pyforest.install_labextension() # takes 30-60s due to jupyter lab build +``` + +Run JupyterLab in watch mode +``` +jupyter lab --watch +``` + +When you make changes on the javascript side, refresh the browser (clear cache) for changes to take effect. +### Jupyter Notebook + +Via terminal, it's the same procedure as normal installation: + +```bash +python -m pyforest install_nbextension +``` + +It is also possible via Python: +```python +import pyforest +pyforest.install_nbextension() +``` + +Run Notebook +``` +jupyter notebook +``` + +When you make changes on the javascript side, you need to install nbextension again. +```bash +python -m pyforest install_nbextension +``` + +## Syntax formatting +We use `black` for formatting the Python code diff --git a/dockerfiles/pyforest_sandbox/Dockerfile b/dockerfiles/pyforest_sandbox/Dockerfile new file mode 100644 index 0000000..77808e1 --- /dev/null +++ b/dockerfiles/pyforest_sandbox/Dockerfile @@ -0,0 +1,14 @@ +FROM jupyter/minimal-notebook + +USER root + +COPY ./test.ipynb ./ + +# Install Python 3 packages +RUN conda install --quiet --yes \ + 'Cython' \ + && \ + conda clean --all -f -y + +RUN pip install pyforest==1.0.2 pandas numpy seaborn +RUN python -m pyforest install_extensions \ No newline at end of file diff --git a/dockerfiles/pyforest_sandbox/commands.sh b/dockerfiles/pyforest_sandbox/commands.sh new file mode 100644 index 0000000..eeac981 --- /dev/null +++ b/dockerfiles/pyforest_sandbox/commands.sh @@ -0,0 +1,3 @@ +docker build -t 8080labs/pyforest_sandbox . && \ +say "docker update jupypter notebook sandbox ready" && \ +docker run --rm -p 8888:8888 8080labs/pyforest_sandbox diff --git a/dockerfiles/pyforest_sandbox/test.ipynb b/dockerfiles/pyforest_sandbox/test.ipynb new file mode 100644 index 0000000..ecd4bef --- /dev/null +++ b/dockerfiles/pyforest_sandbox/test.ipynb @@ -0,0 +1,46 @@ +{ + "nbformat": 4, + "nbformat_minor": 2, + "metadata": { + "language_info": { + "name": "python", + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "version": "3.7.3-final" + }, + "orig_nbformat": 2, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "npconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": 3, + "kernelspec": { + "name": "python37364bitbaseconda9dbe7a38796a4966923d0aec411ee3e8", + "display_name": "Python 3.7.3 64-bit ('base': conda)" + } + }, + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pyforest" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df = pd.DataFrame(dict(a=np.arange(10)))\n", + "sns.distplot(df.a)" + ] + } + ] +} \ No newline at end of file diff --git a/examples/assets/pyforest_demo_in_jupyter.gif b/examples/assets/pyforest_demo_in_jupyter.gif deleted file mode 100644 index 3f52142..0000000 Binary files a/examples/assets/pyforest_demo_in_jupyter.gif and /dev/null differ diff --git a/examples/assets/pyforest_demo_in_jupyter_notebook.gif b/examples/assets/pyforest_demo_in_jupyter_notebook.gif new file mode 100644 index 0000000..e64b89b Binary files /dev/null and b/examples/assets/pyforest_demo_in_jupyter_notebook.gif differ diff --git a/examples/assets/pyforest_demo_in_python_shell.png b/examples/assets/pyforest_demo_in_python_shell.png deleted file mode 100644 index c92a01e..0000000 Binary files a/examples/assets/pyforest_demo_in_python_shell.png and /dev/null differ diff --git a/examples/demo.ipynb b/examples/demo.ipynb index 7b26df7..6abc691 100644 --- a/examples/demo.ipynb +++ b/examples/demo.ipynb @@ -6,7 +6,7 @@ "metadata": {}, "outputs": [], "source": [ - "# from pyforest import * # not needed because of auto_import" + "import pyforest" ] }, { @@ -24,7 +24,8 @@ "metadata": {}, "outputs": [], "source": [ - "sns.distplot(df.Age)" + "sns.distplot(df.Age)\n", + "plt.show()" ] }, { @@ -32,9 +33,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "active_imports()" - ] + "source": [] } ], "metadata": { @@ -57,9 +56,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.4" + "version": "3.7.6-final" } }, "nbformat": 4, "nbformat_minor": 2 -} +} \ No newline at end of file diff --git a/examples/demo.py b/examples/demo.py index 36cf90a..df5f7b0 100644 --- a/examples/demo.py +++ b/examples/demo.py @@ -1,11 +1,8 @@ -# %% -# from pyforest import * # not needed because of auto_import +import pyforest -# %% df = pd.read_csv("titanic.csv") -# %% sns.distplot(df.Age) +plt.show() + -# %% -active_imports() diff --git a/meta.yaml b/meta.yaml index eab1b4b..44f7b93 100644 --- a/meta.yaml +++ b/meta.yaml @@ -1,4 +1,4 @@ -{% set version = "0.1.2" %} +{% set version = "0.1.3" %} package: name: "pyforest" diff --git a/setup.cfg b/setup.cfg index 01c3121..355f923 100644 --- a/setup.cfg +++ b/setup.cfg @@ -4,7 +4,7 @@ [metadata] name = pyforest -version = 0.1.3 +version = 1.1.2 description = Lazy-import of all popular Python Data Science libraries. Stop writing the same imports over and over again. author = Florian Wetschoreck, Guido Drechsel, Tobias Krabel author-email = info@8080labs.com diff --git a/setup.py b/setup.py index 8972789..c09f793 100644 --- a/setup.py +++ b/setup.py @@ -19,3 +19,4 @@ if __name__ == "__main__": setup() setup_auto_import() + # extensions cannot be installed because pyforest is only available after the installation diff --git a/src/pyforest/__init__.py b/src/pyforest/__init__.py index 837699f..afa9ad1 100644 --- a/src/pyforest/__init__.py +++ b/src/pyforest/__init__.py @@ -1,6 +1,20 @@ # -*- coding: utf-8 -*- from ._imports import * +from ._importable import disable_javascript_update +from .utils import ( + get_user_symbols, + install_extensions, + install_nbextension, + install_labextension, +) +user_symbols = get_user_symbols() +pyforest_imports = globals().copy().keys() + +for import_symbol in pyforest_imports: + # don't overwrite symbols of the user + if import_symbol not in user_symbols.keys(): + user_symbols[import_symbol] = eval(import_symbol) # set __version__ attribute from pkg_resources import get_distribution, DistributionNotFound @@ -11,3 +25,18 @@ __version__ = "unknown" finally: del get_distribution, DistributionNotFound + + +def _jupyter_nbextension_paths(): + return [ + { + "section": "notebook", + "src": "static", + "dest": "pyforest", + "require": "pyforest/nbextension", + } + ] + + +def _jupyter_labextension_paths(): + return [{"name": "pyforest", "src": "static"}] diff --git a/src/pyforest/__main__.py b/src/pyforest/__main__.py new file mode 100644 index 0000000..9253d6f --- /dev/null +++ b/src/pyforest/__main__.py @@ -0,0 +1,22 @@ +from .utils import install_extensions, install_nbextension, install_labextension + +VALID_COMMANDS = dict( + install_extensions=install_extensions, + install_nbextension=install_nbextension, + install_labextension=install_labextension, +) + +USAGE = """Usage: python -m pyforest + : one of install_extensions, install_nbextension, install_labextension +installs notebook/lab extensions +""" + +if __name__ == "__main__": + import sys + + if len(sys.argv) != 2 or not sys.argv[1] in VALID_COMMANDS.keys(): + print(USAGE) + sys.exit(-1) + + install_function = VALID_COMMANDS[sys.argv[1]] + install_function() diff --git a/src/pyforest/_importable.py b/src/pyforest/_importable.py index 1b792a0..817516f 100644 --- a/src/pyforest/_importable.py +++ b/src/pyforest/_importable.py @@ -35,8 +35,30 @@ def __maybe_import_complementary_imports__(self): def __maybe_import__(self): self.__maybe_import_complementary_imports__() exec(self.__import_statement__, globals()) - # Attention: if the import fails, the next line will not be reached + # Attention: if the import fails, the next lines will not be reached self.__was_imported__ = True + self.__maybe_add_docstring_and_signature__() + + # Always update the import cell again + # this is not problem because the update is fast + # but it solves the problem of updating the first cell even if the first import was triggered via autocomplete + # when the import cell is only updated the first time, autocompletes wont result in updated cells + # Attention: the first cell is not updated after the autocomplete but after the cell (with the autocomplete) is executed + _update_import_cell() + + def __maybe_add_docstring_and_signature__(self): + # adds docstrings for imported objects + # UnitRegistry = LazyImport("from pint import UnitRegistry") + # UnitRegistry? + + try: + self.__doc__ = eval(f"{self.__imported_name__}.__doc__") + + from inspect import signature + + self.__signature__ = eval(f"signature({self.__imported_name__})") + except: + pass # among others, called during auto-completion of IPython/Jupyter def __dir__(self): @@ -69,10 +91,43 @@ def __repr__(self, *args, **kwargs): return f"lazy pyforest.LazyImport for '{self.__import_statement__}'" -def _import_statements(symbol_dict, was_imported=True): +def disable_javascript_update(): + """ + For use in non-Jupyter environments, disable _update_import_cell + """ + from pyforest import _importable + _importable._update_import_cell_disabled = _importable._update_import_cell + _importable._update_import_cell = lambda: None + + +def _update_import_cell(): + try: + from IPython.display import display, Javascript + except ImportError: + return + + # import here and not at top of file in order to not interfere with importables + from ._imports import active_imports + + statements = active_imports(print_statements=False) + + display( + Javascript( + """ + if (window._pyforest_update_imports_cell) {{ window._pyforest_update_imports_cell({!r}); }} + """.format( + "\n".join(statements) + ) + ) + ) + + +def _get_import_statements(symbol_dict, was_imported=True): statements = [] for _, symbol in symbol_dict.items(): if isinstance(symbol, LazyImport) and (symbol.__was_imported__ == was_imported): - print(symbol.__import_statement__) statements.append(symbol.__import_statement__) + + # remove potential duplicates, e.g. when user_symbols are passed + statements = list(set(statements)) return statements diff --git a/src/pyforest/_imports.py b/src/pyforest/_imports.py index cd7b75b..aaeade6 100644 --- a/src/pyforest/_imports.py +++ b/src/pyforest/_imports.py @@ -1,14 +1,11 @@ -from ._importable import LazyImport, _import_statements +from ._importable import LazyImport, _get_import_statements from .user_specific_imports import _load_user_specific_imports -# YOU CAN SAVE OWN IMPORTS IN ~/.pyforest/user_imports.py -# TODO: in this file you can also add your most important modules and objects, but we -# recommend storing them in ~/.pyforest/user_imports.py -# If you are missing an import and think it is a common import, please contribute +# If you are missing an import and you think it is a common import, please contribute # via creating a pull request. # If you contribute, we can quickly collect the 80% most frequent imports -# Before you create a pull request, please read the following: +# Before you create a pull request, PLEASE READ THE FOLLOWING: # 0) It is always best to first create a GitHub issue before creating a pull request. # This way you can be sure that your proposal is valid and will be integrated. @@ -16,16 +13,19 @@ # 1) The imported name should be an unambiguous standard convention and highly specific. # Usually, you want to use the names that are proposed in the library's documentation. # However, there should be no or little confusion with other libraries -# e.g. 'import dash_html_components as html' is a 'good' counter example +# Good example: +# 'import pandas as pd' +# Bad example: +# 'import dash_html_components as html' # because 'html' is not specific enough for the dash context. # Also, it is ambiguous with e.g. IPython.display.HTML. # A potential resolution might be 'import dash_html_components as dhc' -# 2) General imports e.g. 'from sklearn.preprocessing import *' are not allowed/possible +# 2) General, implicit imports e.g. 'from sklearn.preprocessing import *' are not possible # because we want to make sure that there is no accidental masking of imported names -# 3) If you disagree with the conventions, you can always adjust your local pyforest or save -# your imports separately in ~/.pyforest/user_imports.py +# 3) If you disagree with the conventions or you are using rare packages, you can save +# your user-specific imports in ~/.pyforest/user_imports.py ### Data Wrangling @@ -38,6 +38,10 @@ load_workbook = LazyImport("from openpyxl import load_workbook") +open_workbook = LazyImport("from xlrd import open_workbook") + +wr = LazyImport("import awswrangler as wr") + ### Data Visualization and Plotting mpl = LazyImport("import matplotlib as mpl") plt = LazyImport("import matplotlib.pyplot as plt") @@ -56,14 +60,52 @@ pydot = LazyImport("import pydot") +### Image processing + +cv2 = LazyImport("import cv2") +skimage = LazyImport("import skimage") +Image = LazyImport("from PIL import Image") +imutils = LazyImport("import imutils") + # statistics statistics = LazyImport("import statistics") +stats = LazyImport("from scipy import stats") +sm = LazyImport("import statsmodels.api as sm") + +### Time-Series Forecasting +fbprophet = LazyImport("import fbprophet") +Prophet = LazyImport("from fbprophet import Prophet") +ARIMA = LazyImport("from statsmodels.tsa.arima_model import ARIMA") ### Machine Learning sklearn = LazyImport("import sklearn") + +LinearRegression = LazyImport("from sklearn.linear_model import LinearRegression") +LogisticRegression = LazyImport("from sklearn.linear_model import LogisticRegression") +Lasso = LazyImport("from sklearn.linear_model import Lasso") +LassoCV = LazyImport("from sklearn.linear_model import LassoCV") +Ridge = LazyImport("from sklearn.linear_model import Ridge") +RidgeCV = LazyImport("from sklearn.linear_model import RidgeCV") +ElasticNet = LazyImport("from sklearn.linear_model import ElasticNet") +ElasticNetCV = LazyImport("from sklearn.linear_model import ElasticNetCV") +PolynomialFeatures = LazyImport("from sklearn.preprocessing import PolynomialFeatures") +StandardScaler = LazyImport("from sklearn.preprocessing import StandardScaler") +MinMaxScaler = LazyImport("from sklearn.preprocessing import MinMaxScaler") +RobustScaler = LazyImport("from sklearn.preprocessing import RobustScaler") + + OneHotEncoder = LazyImport("from sklearn.preprocessing import OneHotEncoder") +LabelEncoder = LazyImport("from sklearn.preprocessing import LabelEncoder") TSNE = LazyImport("from sklearn.manifold import TSNE") +PCA = LazyImport("from sklearn.decomposition import PCA") +SimpleImputer = LazyImport("from sklearn.impute import SimpleImputer") train_test_split = LazyImport("from sklearn.model_selection import train_test_split") +cross_val_score = LazyImport("from sklearn.model_selection import cross_val_score") +GridSearchCV = LazyImport("from sklearn.model_selection import GridSearchCV") +RandomizedSearchCV = LazyImport("from sklearn.model_selection import RandomizedSearchCV") +KFold = LazyImport("from sklearn.model_selection import KFold") +StratifiedKFold = LazyImport("from sklearn.model_selection import StratifiedKFold") + svm = LazyImport("from sklearn import svm") GradientBoostingClassifier = LazyImport( "from sklearn.ensemble import GradientBoostingClassifier" @@ -80,18 +122,36 @@ "from sklearn.feature_extraction.text import TfidfVectorizer" ) +CountVectorizer = LazyImport( + "from sklearn.feature_extraction.text import CountVectorizer" +) + +metrics = LazyImport("from sklearn import metrics") + +sg = LazyImport("from scipy import signal as sg") + +# Clustering +KMeans = LazyImport ("from sklearn.cluster import KMeans") + +# Gradient Boosting Decision Tree +xgb = LazyImport("import xgboost as xgb") +lgb = LazyImport("import lightgbm as lgb") + # TODO: add all the other most important sklearn objects # TODO: add separate sections within machine learning viz. Classification, Regression, Error Functions, Clustering # Deep Learning tf = LazyImport("import tensorflow as tf") keras = LazyImport("import keras") +torch = LazyImport("import torch") +fastai = LazyImport("import fastai") # NLP nltk = LazyImport("import nltk") gensim = LazyImport("import gensim") spacy = LazyImport("import spacy") re = LazyImport("import re") +textblob = LazyImport("import textblob") ### Helper sys = LazyImport("import sys") @@ -99,6 +159,7 @@ re = LazyImport("import re") glob = LazyImport("import glob") Path = LazyImport("from pathlib import Path") +random = LazyImport("import random") pickle = LazyImport("import pickle") @@ -107,6 +168,11 @@ tqdm = LazyImport("import tqdm") +################################################## +### dont make adjustments below this line ######## +################################################## + + ############################# ### User-specific imports ### ############################# @@ -119,27 +185,27 @@ del _load_user_specific_imports -####################################### -### Complementary, optional imports ### -####################################### -# Why is this needed? Some libraries patch existing libraries -# Please note: these imports are only executed if you already have the library installed -# If you want to deactivate specific complementary imports, do the following: -# - uncomment the lines which contain `.__on_import__` and the library you want to deactivate +# ####################################### +# ### Complementary, optional imports ### +# ####################################### +# # Why is this needed? Some libraries patch existing libraries +# # Please note: these imports are only executed if you already have the library installed +# # If you want to deactivate specific complementary imports, do the following: +# # - uncomment the lines which contain `.__on_import__` and the library you want to deactivate -pandas_profiling = LazyImport("import pandas_profiling") -pd.__on_import__(pandas_profiling) # adds df.profile_report attribute to pd.DataFrame +# pandas_profiling = LazyImport("import pandas_profiling") +# pd.__on_import__(pandas_profiling) # adds df.profile_report attribute to pd.DataFrame -eda = LazyImport("import edaviz as eda") -pd.__on_import__(eda) # adds GUI to pd.DataFrame when IPython frontend can display it +# bam = LazyImport("import bamboolib as bam") +# pd.__on_import__(bam) # adds GUI to pd.DataFrame when IPython frontend can display it -################################################## -### dont make adjustments below this line ######## -################################################## def lazy_imports(): - return _import_statements(globals(), was_imported=False) + return _get_import_statements(globals(), was_imported=False) -def active_imports(): - return _import_statements(globals(), was_imported=True) +def active_imports(print_statements=True): + statements = _get_import_statements(globals(), was_imported=True) + if print_statements: + print("\n".join(statements)) + return statements diff --git a/src/pyforest/auto_import.py b/src/pyforest/auto_import.py index 8083040..b792548 100644 --- a/src/pyforest/auto_import.py +++ b/src/pyforest/auto_import.py @@ -25,7 +25,7 @@ def _write_into_startup_file(): # 1) if you never want to auto-import pyforest again, you can delete this file try: - from pyforest import * # uncomment this line if you temporarily dont want to auto-import pyforest + import pyforest # uncomment this line if you temporarily dont want to auto-import pyforest pass except: pass diff --git a/src/pyforest/package-lock.json b/src/pyforest/package-lock.json new file mode 100644 index 0000000..b89f69a --- /dev/null +++ b/src/pyforest/package-lock.json @@ -0,0 +1,5 @@ +{ + "name": "pyforest", + "version": "0.0.1", + "lockfileVersion": 1 +} diff --git a/src/pyforest/package.json b/src/pyforest/package.json new file mode 100644 index 0000000..36d4273 --- /dev/null +++ b/src/pyforest/package.json @@ -0,0 +1,19 @@ +{ + "private": true, + "name": "pyforest", + "version": "0.0.1", + "description": "Automatically adds pyforest lazy imports to first cell of Jupyter Notebook or Jupyter Lab because explicit is better than implicit", + "author": "Florian Wetschoreck, Tobias Krabel", + "main": "static/labextension.js", + "keywords": [ + "jupyter", + "jupyterlab", + "jupyterlab-extension" + ], + "jupyterlab": { + "extension": true + }, + "scripts": {}, + "dependencies": {}, + "devDependencies": {} +} diff --git a/src/pyforest/static/labextension.js b/src/pyforest/static/labextension.js new file mode 100644 index 0000000..2831909 --- /dev/null +++ b/src/pyforest/static/labextension.js @@ -0,0 +1,15 @@ +const utils = require('./utils.js'); +const notebook = require('@jupyterlab/notebook'); + +module.exports = [{ + id: 'pyforest', + autoStart: true, + requires: [notebook.INotebookTracker], + activate: function (app, notebookTracker) { + // Need to create NotebookPanel and connect it with NotebookTracker first + notebookTracker.widgetAdded.connect(async (tracker, notebookPanel) => { + await notebookPanel.revealed; + utils.setup_lab(notebookPanel); + }); + } +}]; diff --git a/src/pyforest/static/nbextension.js b/src/pyforest/static/nbextension.js new file mode 100644 index 0000000..854bf6d --- /dev/null +++ b/src/pyforest/static/nbextension.js @@ -0,0 +1,9 @@ +define(['base/js/namespace', './utils'], function (Jupyter, utils) { + function load_ipython_extension() { + utils.setup_notebook(Jupyter); + } + + return { + load_ipython_extension: load_ipython_extension + }; +}); diff --git a/src/pyforest/static/utils.js b/src/pyforest/static/utils.js new file mode 100644 index 0000000..07fdd57 --- /dev/null +++ b/src/pyforest/static/utils.js @@ -0,0 +1,60 @@ +define([], function () { + function first_code_cell_in_lab(notebookPanel) { + var cells = notebookPanel.content.widgets + for (let index in cells) { + if (cells[index].model.type == "code") { + return index; + } + } + // This should never happen because this function is called when the user + // executes a code cell. + throw new Error("No single code cell found"); + } + + function setup_lab(notebookPanel) { + window._pyforest_update_imports_cell = function (imports_string) { + var first_code_cell_index = first_code_cell_in_lab(notebookPanel) + var cell = notebookPanel.model.cells.get(first_code_cell_index).value; + cell.text = get_new_cell_content(imports_string, cell.text); + }; + } + + function first_code_cell_in_notebook(Jupyter) { + var cells = Jupyter.notebook.get_cells(); + for (let index in cells) { + if (cells[index]["cell_type"] == "code") { + return index; + } + } + // This should never happen because this function is called when the user + // executes a code cell. + throw new Error("No single code cell found"); + } + + function setup_notebook(Jupyter) { + window._pyforest_update_imports_cell = function (imports_string) { + var first_code_cell_index = first_code_cell_in_notebook(Jupyter) + var cell_doc = Jupyter.notebook.get_cell(first_code_cell_index).code_mirror.getDoc(); + cell_doc.setValue(get_new_cell_content(imports_string, cell_doc.getValue())); + }; + } + + function get_new_cell_content(imports_string, current_content) { + var separator = `# ^^^ pyforest auto-imports - don't write above this line`; + var parts = current_content.split(separator); + var user_content = "" + if (parts.length > 1) { + // User content is everything after the first separator. + // If the user adds another separator, pyforest only updates the content above the first separator. + user_content = parts.slice(1) + } else { + user_content = parts + } + return imports_string + '\n' + separator + '\n' + user_content.join('\n').trim('\n'); + } + + return { + setup_lab: setup_lab, + setup_notebook: setup_notebook + }; +}); diff --git a/src/pyforest/user_specific_imports.py b/src/pyforest/user_specific_imports.py index a1d22ed..6350b20 100644 --- a/src/pyforest/user_specific_imports.py +++ b/src/pyforest/user_specific_imports.py @@ -3,13 +3,12 @@ USER_IMPORTS_PATH = Path.home() / ".pyforest" / "user_imports.py" -TEMPLATE_TEXT = ( - "# Add your imports here, line by line\n" - "# e.g\n" - "# import pandas as pd\n" - "# from pathlib import Path\n" - "# import re\n" -) +TEMPLATE_TEXT = """# Add your imports here, line by line +# e.g +# import pandas as pd +# from pathlib import Path +# import re +""" def _clean_line(x: str) -> str: @@ -24,26 +23,20 @@ def _is_empty_line(x: str) -> bool: return x == "" -def _is_real_import(x: str) -> bool: +def _is_import_statement(x: str) -> bool: return not (_is_comment(x) or _is_empty_line(x)) -def _keep_real_imports(import_statements: list) -> list: - return [ - import_statement - for import_statement in import_statements - if _is_real_import(import_statement) - ] +def _find_imports(file_lines: list) -> list: + return [file_line for file_line in file_lines if _is_import_statement(file_line)] -def _clean_import_statements(import_statements: list) -> list: - cleaned_import_statements = [ - _clean_line(import_statement) for import_statement in import_statements - ] - return _keep_real_imports(cleaned_import_statements) +def _get_imports(file_lines: list) -> list: + cleaned_lines = [_clean_line(line) for line in file_lines] + return _find_imports(cleaned_lines) -def _read_import_statetments_from_user_settings(user_settings_path: str) -> list: +def _read_file_lines_from_user_settings(user_settings_path: str) -> list: file_in = open(user_settings_path, "r") return file_in.readlines() @@ -60,24 +53,22 @@ def _maybe_init_user_imports_file(user_imports_path: Path) -> None: user_imports_path.write_text(TEMPLATE_TEXT) -def _get_import_statetments_from_user_settings(user_imports_path) -> list: +def _get_imports_from_user_settings(user_imports_path) -> list: _maybe_init_user_imports_file(user_imports_path) - import_statements = _read_import_statetments_from_user_settings(user_imports_path) - return _clean_import_statements(import_statements) + file_lines = _read_file_lines_from_user_settings(user_imports_path) + return _get_imports(file_lines) -def _assign_imports_to_global_space(import_statements: list, globals_) -> None: +def _assign_imports_to_globals(import_statements: list, globals_) -> None: symbols = [import_statement.split()[-1] for import_statement in import_statements] for symbol, import_statement in zip(symbols, import_statements): exec(f"{symbol} = LazyImport('{import_statement}')", globals_) -# add user_imports_path as argument so that we can run tests on that function +# user_imports_path exists as argument so that we can run tests on the function def _load_user_specific_imports( globals_: dict, user_imports_path=USER_IMPORTS_PATH ) -> None: - user_import_statements = _get_import_statetments_from_user_settings( - user_imports_path - ) - _assign_imports_to_global_space(user_import_statements, globals_) + import_statements = _get_imports_from_user_settings(user_imports_path) + _assign_imports_to_globals(import_statements, globals_) diff --git a/src/pyforest/utils.py b/src/pyforest/utils.py new file mode 100644 index 0000000..4fe41c5 --- /dev/null +++ b/src/pyforest/utils.py @@ -0,0 +1,70 @@ +def get_user_symbols(): + import inspect + + for index, item in enumerate(inspect.stack()): + try: + name = item[0].f_globals["__name__"] + if name == "__main__": + return item[0].f_globals + except: # __name__ attribute does not exist + pass + return {} + + +def install_extensions(): + print( + "Starting to install pyforest extensions for Jupyter Notebook and Jupyter Lab" + ) + print("") + install_nbextension() + print("") + install_labextension() + print("") + print("Finished installing the pyforest Jupyter extensions") + print("Please reload your Jupyter notebook and/or Jupyter lab browser windows") + + +def install_nbextension(): + print("Trying to install pyforest nbextension...") + + try: + from notebook import nbextensions + except ImportError: + print( + "Could not install pyforest Jupyter Notebook extension because Jupyter Notebook is not available" + ) + return + + nbextensions.install_nbextension_python("pyforest", user=True) + nbextensions.enable_nbextension_python("pyforest") + print("") + print("Finished installing the pyforest Jupyter Notebook nbextension") + print("Please reload your Jupyter notebook browser window") + + +def install_labextension(): + print("Trying to install pyforest labextension...") + + try: + from jupyterlab import commands + except ImportError: + print( + "Could not install pyforest Jupyter Lab extension because Jupyter Lab is not available" + ) + return + + from pathlib import Path + + dir = Path(__file__).parent + + should_build = commands.install_extension(str(dir)) + print("Successfully installed pyforest Jupyter Lab labextension") + + if should_build: + print("") + print("Starting JupyterLab build") + commands.build() + print("Successfully built JupyterLab") + + print("") + print("Please reload your Jupyter Lab browser window") diff --git a/tests/test_install.sh b/tests/test_install.sh new file mode 100755 index 0000000..49452ac --- /dev/null +++ b/tests/test_install.sh @@ -0,0 +1,17 @@ +#!/bin/bash + +# RUN THIS SCRIPT FROM ROOT DIR (i.e. as ./tests/test_install.sh) + +conda deactivate +conda env remove --name pyforest_venv + +conda create -n pyforest_venv python=3.7 -y +conda activate pyforest_venv + +conda install pip -y +conda install ipykernel -y + +pip install jupyterlab + +pip install -e . # breaks if this script is not run from ROOT +python -m pyforest install_extensions # current error when script run from terminal: module pyforest not found \ No newline at end of file diff --git a/upload_to_pypi.sh b/upload_to_pypi.sh index 2f1daad..021e345 100755 --- a/upload_to_pypi.sh +++ b/upload_to_pypi.sh @@ -5,5 +5,5 @@ then rm -rf dist/* fi -python3 setup.py sdist +python3 setup.py sdist # use miniconda3/bin/python3 twine upload dist/* \ No newline at end of file