Skip to content

Commit 0076310

Browse files
committed
add doc explaining implementation choices
1 parent 0e85c6f commit 0076310

File tree

2 files changed

+179
-0
lines changed

2 files changed

+179
-0
lines changed

docs/source/design.rst

Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
=========================================================
2+
Why virtualenvwrapper is (Mostly) Not Written In Python
3+
=========================================================
4+
5+
If you look at the source code for virtualenvwrapper you will see that
6+
most of the interesting parts are implemented as shell functions in
7+
``virtualenvwrapper.sh``. The hook loader is a Python app, but doesn't
8+
do much to manage the virtualenvs. Some of the most frequently asked
9+
questions about virtualenvwrapper are "Why didn't you write this as a
10+
set of Python programs?" or "Have you thought about rewriting it in
11+
Python?" For a long time these questions baffled me, because it was
12+
always obvious to me that it had to be implemented as it is. But they
13+
come up frequently enough that I feel the need to explain.
14+
15+
tl;dr: POSIX Made Me Do It
16+
==========================
17+
18+
The choice of implementation language for virtualenvwrapper was made
19+
for pragmatic, rather than philosophical, reasons. The wrapper
20+
commands need to modify the state and environment of the user's
21+
*current shell process*, and the only way to do that is to have the
22+
commands run *inside that shell.* That resulted in me writing
23+
virtualenvwrapper as a set of shell functions, rather than separate
24+
shell scripts or even Python programs.
25+
26+
Where Do POSIX Processes Come From?
27+
===================================
28+
29+
New POSIX processes are created when an existing process invokes the
30+
``fork()`` system call. The invoking process becomes the "parent" of
31+
the new "child" process, and the child is a full clone of the
32+
parent. The *semantic* result of ``fork()`` is that an entire new copy
33+
of the parent process is created. In practice, optimizations are
34+
normally made to avoid copying more memory than is absolutely
35+
necessary (frequently via a copy-on-write system). But for the
36+
purposes of this explanation it is sufficient to think of the child as
37+
a full replica of the parent.
38+
39+
The important parts of the parent process that are copied include
40+
dynamic memory (the stack and heap), static stuff (the program code),
41+
resources like open file descriptors, and the *environment variables*
42+
exported from the parent process. Inheriting environment variables is
43+
a fundamental aspect of the way POSIX programs pass state and
44+
configuration information to one another. A parent can establish a
45+
series of ``name=value`` pairs, which are then given to the child
46+
process. The child can access them through functions like
47+
``getenv()``, ``setenv()`` (and in Python through ``os.environ``).
48+
49+
The choice of the term *inherit* to describe the way the variables and
50+
their contents are passed from parent to child is
51+
significant. Although a child can change its own environment, it
52+
cannot directly change the environment settings of its parent
53+
because there is no system call to modify the parental environment
54+
settings.
55+
56+
How the Shell Runs a Program
57+
============================
58+
59+
When a shell receives a command to be executed, either interactively
60+
or by parsing a script file, and determines that the command is
61+
implemented in a separate program file, is uses ``fork()`` to create a
62+
new process and then inside that process it uses one of the ``exec``
63+
functions to start the specified program. The language that program is
64+
written in doesn't make any difference in the decision about whether
65+
or not to ``fork()``, so even if the "program" is a shell script
66+
written in the language understood by the current shell, a new process
67+
is created.
68+
69+
On the other hand, if the shell decides that the command is a
70+
*function*, then it looks at the definition and invokes it
71+
directly. Shell functions are made up of other commands, some of which
72+
may result in child processes being created, but the function itself
73+
runs in the original shell process and can therefore modify its state,
74+
for example by changing the working directory or the values of
75+
variables.
76+
77+
It is possible to force the shell to run a script directly, and not in
78+
a child process, by *sourcing* it. The ``source`` command causes the
79+
shell to read the file and interpret it in the current process. Again,
80+
as with functions, the contents of the file may cause child processes
81+
to be spawned, but there is not a second shell process interpreting
82+
the series of commands.
83+
84+
What Does This Mean for virtualenvwrapper?
85+
==========================================
86+
87+
The original and most important features of virtualenvwrapper are
88+
automatically activating a virtualenv when it is created by
89+
``mkvirtualenv`` and using ``workon`` to deactivate one environment
90+
and activate another. Making these features work drove the
91+
implementation decisions for the other parts of virtualenvwrapper,
92+
too.
93+
94+
Environments are activated interactively by sourcing ``bin/activate``
95+
inside the virtualenv. The ``activate`` script does a few things, but
96+
the important parts are setting the ``VIRTUAL_ENV`` variable and
97+
modifying the shell's search path through the ``PATH`` variable to put
98+
the ``bin`` directory for the environment on the front of the
99+
path. Changing the path means that the programs installed in the
100+
environment, especially the python interpreter there, are found before
101+
other programs with the same name.
102+
103+
Simply running ``bin/activate``, without using ``source`` doesn't work
104+
because it sets up the environment of the *child* process, without
105+
affecting the parent. In order to source the activate script in the
106+
interactive shell, both ``mkvirtualenv`` and ``workon`` also need to
107+
be run in that shell process.
108+
109+
Why Choose One When You Can Have Both?
110+
======================================
111+
112+
The hook loader is one part of virtualenvwrapper that *is* written in
113+
Python. Why? Again, because it was easier. Hooks are discovered using
114+
setuptools entry points, because after an entry point is installed the
115+
user doesn't have to take any other action to allow the loader to
116+
discover and use it. It's easy to imagine writing a hook to create new
117+
files on the filesystem (by installing a package, instantiating a
118+
template, etc.).
119+
120+
How, then, do hooks running in a separate process (the Python
121+
interpreter) modify the shell environment to set variables or change
122+
the working directory? They cheat, of course.
123+
124+
Each hook point defined by virtualenvwrapper actually represents two
125+
hooks. First, the hooks meant to be run in Python are executed. Then
126+
the "source" hooks are run, and they *print out* a series of shell
127+
commands. All of those commands are collected, saved to a temporary
128+
file, and then the shell is told to source the file.
129+
130+
Starting up the hook loader turns out to be way more expensive than
131+
most of the other actions virtualenvwrapper takes, though, so I am
132+
considering making its use optional. Most users customize the hooks by
133+
using shell scripts (either globally or in the virtualenv). Finding
134+
and running those can be handled by the shell quite easily.
135+
136+
Implications for Cross-Shell Compatibility
137+
==========================================
138+
139+
Other than requests for a full-Python implementation, the other most
140+
common request is to support additional shells. fish_ comes up a lot,
141+
as do various Windows-only shells. The officially
142+
:ref:`supported-shells` all have a common enough syntax that the same
143+
implementation works for each. Supporting other shells would require
144+
rewriting much, if not all, of the logic using an alternate syntax --
145+
those other shells are basically different programming languages. So
146+
far I have dealt with the ports by encouraging other developers to
147+
handle them, and then trying to link to and otherwise promote the
148+
results.
149+
150+
.. _fish: http://ridiculousfish.com/shell/
151+
152+
Not As Bad As It Seems
153+
======================
154+
155+
Although there are some special challenges created by the the
156+
requirement that the commands run in a user's interactive shell (see
157+
the many bugs reported by users who alias common commands like ``rm``
158+
and ``cd``), using the shell as a programming language holds up quite
159+
well. The shells are designed to make finding and executing other
160+
programs easy, and especially to make it easy to combine a series of
161+
smaller programs to perform more complicated operations. As that's
162+
what virtualenvwrapper is doing, it's a natural fit.
163+
164+
.. seealso::
165+
166+
* `Advanced Programming in the UNIX Environment`_ by W. Richard
167+
Stevens & Stephen A. Rago
168+
* `Fork (operating system)`_ on Wikipedia
169+
* `Environment variable`_ on Wikipedia
170+
* `Linux implementation of fork()`_
171+
172+
.. _Advanced Programming in the UNIX Environment: http://www.amazon.com/gp/product/0321637739/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=0321637739&linkCode=as2&tag=hellflynet-20
173+
174+
.. _Fork (operating system): http://en.wikipedia.org/wiki/Fork_(operating_system)
175+
176+
.. _Environment variable: http://en.wikipedia.org/wiki/Environment_variable
177+
178+
.. _Linux implementation of fork(): https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/kernel/fork.c?id=refs/tags/v3.9-rc8#n1558

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,7 @@ Details
179179
tips
180180
developers
181181
extensions
182+
design
182183
history
183184

184185
.. _references:

0 commit comments

Comments
 (0)