|
| 1 | +========================================================= |
| 2 | + Why virtualenvwrapper is (Mostly) Not Written In Python |
| 3 | +========================================================= |
| 4 | + |
| 5 | +If you look at the source code for virtualenvwrapper you will see that |
| 6 | +most of the interesting parts are implemented as shell functions in |
| 7 | +``virtualenvwrapper.sh``. The hook loader is a Python app, but doesn't |
| 8 | +do much to manage the virtualenvs. Some of the most frequently asked |
| 9 | +questions about virtualenvwrapper are "Why didn't you write this as a |
| 10 | +set of Python programs?" or "Have you thought about rewriting it in |
| 11 | +Python?" For a long time these questions baffled me, because it was |
| 12 | +always obvious to me that it had to be implemented as it is. But they |
| 13 | +come up frequently enough that I feel the need to explain. |
| 14 | + |
| 15 | +tl;dr: POSIX Made Me Do It |
| 16 | +========================== |
| 17 | + |
| 18 | +The choice of implementation language for virtualenvwrapper was made |
| 19 | +for pragmatic, rather than philosophical, reasons. The wrapper |
| 20 | +commands need to modify the state and environment of the user's |
| 21 | +*current shell process*, and the only way to do that is to have the |
| 22 | +commands run *inside that shell.* That resulted in me writing |
| 23 | +virtualenvwrapper as a set of shell functions, rather than separate |
| 24 | +shell scripts or even Python programs. |
| 25 | + |
| 26 | +Where Do POSIX Processes Come From? |
| 27 | +=================================== |
| 28 | + |
| 29 | +New POSIX processes are created when an existing process invokes the |
| 30 | +``fork()`` system call. The invoking process becomes the "parent" of |
| 31 | +the new "child" process, and the child is a full clone of the |
| 32 | +parent. The *semantic* result of ``fork()`` is that an entire new copy |
| 33 | +of the parent process is created. In practice, optimizations are |
| 34 | +normally made to avoid copying more memory than is absolutely |
| 35 | +necessary (frequently via a copy-on-write system). But for the |
| 36 | +purposes of this explanation it is sufficient to think of the child as |
| 37 | +a full replica of the parent. |
| 38 | + |
| 39 | +The important parts of the parent process that are copied include |
| 40 | +dynamic memory (the stack and heap), static stuff (the program code), |
| 41 | +resources like open file descriptors, and the *environment variables* |
| 42 | +exported from the parent process. Inheriting environment variables is |
| 43 | +a fundamental aspect of the way POSIX programs pass state and |
| 44 | +configuration information to one another. A parent can establish a |
| 45 | +series of ``name=value`` pairs, which are then given to the child |
| 46 | +process. The child can access them through functions like |
| 47 | +``getenv()``, ``setenv()`` (and in Python through ``os.environ``). |
| 48 | + |
| 49 | +The choice of the term *inherit* to describe the way the variables and |
| 50 | +their contents are passed from parent to child is |
| 51 | +significant. Although a child can change its own environment, it |
| 52 | +cannot directly change the environment settings of its parent |
| 53 | +because there is no system call to modify the parental environment |
| 54 | +settings. |
| 55 | + |
| 56 | +How the Shell Runs a Program |
| 57 | +============================ |
| 58 | + |
| 59 | +When a shell receives a command to be executed, either interactively |
| 60 | +or by parsing a script file, and determines that the command is |
| 61 | +implemented in a separate program file, is uses ``fork()`` to create a |
| 62 | +new process and then inside that process it uses one of the ``exec`` |
| 63 | +functions to start the specified program. The language that program is |
| 64 | +written in doesn't make any difference in the decision about whether |
| 65 | +or not to ``fork()``, so even if the "program" is a shell script |
| 66 | +written in the language understood by the current shell, a new process |
| 67 | +is created. |
| 68 | + |
| 69 | +On the other hand, if the shell decides that the command is a |
| 70 | +*function*, then it looks at the definition and invokes it |
| 71 | +directly. Shell functions are made up of other commands, some of which |
| 72 | +may result in child processes being created, but the function itself |
| 73 | +runs in the original shell process and can therefore modify its state, |
| 74 | +for example by changing the working directory or the values of |
| 75 | +variables. |
| 76 | + |
| 77 | +It is possible to force the shell to run a script directly, and not in |
| 78 | +a child process, by *sourcing* it. The ``source`` command causes the |
| 79 | +shell to read the file and interpret it in the current process. Again, |
| 80 | +as with functions, the contents of the file may cause child processes |
| 81 | +to be spawned, but there is not a second shell process interpreting |
| 82 | +the series of commands. |
| 83 | + |
| 84 | +What Does This Mean for virtualenvwrapper? |
| 85 | +========================================== |
| 86 | + |
| 87 | +The original and most important features of virtualenvwrapper are |
| 88 | +automatically activating a virtualenv when it is created by |
| 89 | +``mkvirtualenv`` and using ``workon`` to deactivate one environment |
| 90 | +and activate another. Making these features work drove the |
| 91 | +implementation decisions for the other parts of virtualenvwrapper, |
| 92 | +too. |
| 93 | + |
| 94 | +Environments are activated interactively by sourcing ``bin/activate`` |
| 95 | +inside the virtualenv. The ``activate`` script does a few things, but |
| 96 | +the important parts are setting the ``VIRTUAL_ENV`` variable and |
| 97 | +modifying the shell's search path through the ``PATH`` variable to put |
| 98 | +the ``bin`` directory for the environment on the front of the |
| 99 | +path. Changing the path means that the programs installed in the |
| 100 | +environment, especially the python interpreter there, are found before |
| 101 | +other programs with the same name. |
| 102 | + |
| 103 | +Simply running ``bin/activate``, without using ``source`` doesn't work |
| 104 | +because it sets up the environment of the *child* process, without |
| 105 | +affecting the parent. In order to source the activate script in the |
| 106 | +interactive shell, both ``mkvirtualenv`` and ``workon`` also need to |
| 107 | +be run in that shell process. |
| 108 | + |
| 109 | +Why Choose One When You Can Have Both? |
| 110 | +====================================== |
| 111 | + |
| 112 | +The hook loader is one part of virtualenvwrapper that *is* written in |
| 113 | +Python. Why? Again, because it was easier. Hooks are discovered using |
| 114 | +setuptools entry points, because after an entry point is installed the |
| 115 | +user doesn't have to take any other action to allow the loader to |
| 116 | +discover and use it. It's easy to imagine writing a hook to create new |
| 117 | +files on the filesystem (by installing a package, instantiating a |
| 118 | +template, etc.). |
| 119 | + |
| 120 | +How, then, do hooks running in a separate process (the Python |
| 121 | +interpreter) modify the shell environment to set variables or change |
| 122 | +the working directory? They cheat, of course. |
| 123 | + |
| 124 | +Each hook point defined by virtualenvwrapper actually represents two |
| 125 | +hooks. First, the hooks meant to be run in Python are executed. Then |
| 126 | +the "source" hooks are run, and they *print out* a series of shell |
| 127 | +commands. All of those commands are collected, saved to a temporary |
| 128 | +file, and then the shell is told to source the file. |
| 129 | + |
| 130 | +Starting up the hook loader turns out to be way more expensive than |
| 131 | +most of the other actions virtualenvwrapper takes, though, so I am |
| 132 | +considering making its use optional. Most users customize the hooks by |
| 133 | +using shell scripts (either globally or in the virtualenv). Finding |
| 134 | +and running those can be handled by the shell quite easily. |
| 135 | + |
| 136 | +Implications for Cross-Shell Compatibility |
| 137 | +========================================== |
| 138 | + |
| 139 | +Other than requests for a full-Python implementation, the other most |
| 140 | +common request is to support additional shells. fish_ comes up a lot, |
| 141 | +as do various Windows-only shells. The officially |
| 142 | +:ref:`supported-shells` all have a common enough syntax that the same |
| 143 | +implementation works for each. Supporting other shells would require |
| 144 | +rewriting much, if not all, of the logic using an alternate syntax -- |
| 145 | +those other shells are basically different programming languages. So |
| 146 | +far I have dealt with the ports by encouraging other developers to |
| 147 | +handle them, and then trying to link to and otherwise promote the |
| 148 | +results. |
| 149 | + |
| 150 | +.. _fish: http://ridiculousfish.com/shell/ |
| 151 | + |
| 152 | +Not As Bad As It Seems |
| 153 | +====================== |
| 154 | + |
| 155 | +Although there are some special challenges created by the the |
| 156 | +requirement that the commands run in a user's interactive shell (see |
| 157 | +the many bugs reported by users who alias common commands like ``rm`` |
| 158 | +and ``cd``), using the shell as a programming language holds up quite |
| 159 | +well. The shells are designed to make finding and executing other |
| 160 | +programs easy, and especially to make it easy to combine a series of |
| 161 | +smaller programs to perform more complicated operations. As that's |
| 162 | +what virtualenvwrapper is doing, it's a natural fit. |
| 163 | + |
| 164 | +.. seealso:: |
| 165 | + |
| 166 | + * `Advanced Programming in the UNIX Environment`_ by W. Richard |
| 167 | + Stevens & Stephen A. Rago |
| 168 | + * `Fork (operating system)`_ on Wikipedia |
| 169 | + * `Environment variable`_ on Wikipedia |
| 170 | + * `Linux implementation of fork()`_ |
| 171 | + |
| 172 | +.. _Advanced Programming in the UNIX Environment: http://www.amazon.com/gp/product/0321637739/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=0321637739&linkCode=as2&tag=hellflynet-20 |
| 173 | + |
| 174 | +.. _Fork (operating system): http://en.wikipedia.org/wiki/Fork_(operating_system) |
| 175 | + |
| 176 | +.. _Environment variable: http://en.wikipedia.org/wiki/Environment_variable |
| 177 | + |
| 178 | +.. _Linux implementation of fork(): https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/kernel/fork.c?id=refs/tags/v3.9-rc8#n1558 |
0 commit comments