Skip to content

Commit 5cf119e

Browse files
authored
flesh out continuum history (#1)
1 parent d881fe5 commit 5cf119e

File tree

1 file changed

+24
-8
lines changed

1 file changed

+24
-8
lines changed

community/history.md

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,42 +2,58 @@
22
title: History
33
---
44

5-
# History of conda-forge
5+
# Context of binary packaging for Python
66

77
conda-forge's origins are best understood in the context of Python packaging back in the early 2010s. Back then, the installation of Python packages across operating systems was very challenging, especially on Windows, as it often meant compiling dependencies from source.
88

9-
Python 2.x was the norm. To install it, you'd get the official installers from Python.org, use the system-provided interpreter in Linux, or resort to options like Python(x,y) [^pythonxy], ActiveState ActivePython [^activepython] or Enthought's distributions [^enthought] in macOS and Windows [^legacy-python-downloads].
9+
Python 2.x was the norm. To install it, you'd get the official installers from Python.org, use the system-provided interpreter in Linux, or resort to options like Python(x,y) [^pythonxy], ActiveState ActivePython [^activepython] or Enthought's distributions (EPD, later Canopy) [^enthought] in macOS and Windows [^legacy-python-downloads].
1010

1111
If you wanted to install additional packages, the community was transitioning from `easy_install` to `pip`, and there was no easy way to ship or install pre-compiled Python packages. An alternative to Python eggs [^eggs] wouldn't emerge until 2013 with the formalization of wheels [^wheels]. These were useful for Windows, where Christoph Gohlke's exes and wheels [^cgohlke]<sup>,</sup>[^cgohlke-shutdown] were your only choice.
1212

1313
However, for Linux, you would have to wait until 2016, when [`manylinux` wheels were introduced](https://peps.python.org/pep-0513/). Before then, PyPI wouldn't even allow compiled Linux wheels and your only alternative was to compile every package from source.
1414

1515
As an example, take a look at the [PyPI download page for `numpy` 1.7.0](https://pypi.org/project/numpy/1.7.0/#files), released in Feb 2013. The "Built Distributions" section only shows a few `.exe` files for Windows (!), and some `manylinux1` wheels. However, the `manylinux1` wheels were not uploaded until April 2016. There was no mention whatsoever of macOS. Now compare it to [`numpy` 1.11.0](https://pypi.org/project/numpy/1.11.0/#files), released in March 2016: wheels for all platforms!
1616

17+
The reason why it is hard to find packages for a specific system, and why compilation was the preferred option for many, is binary compatibility. Binary compatibility is a window of compatibility where each combination of compiler version, core libraries such as glibc, and dependency libraries present on the build machine are compatible on destination systems. Linux distributions achieve this by freezing compiler versions and library versions for a particular release cycle. Windows achieves this relatively easily because Python standardized on particular Visual Studio compiler versions for each Python release. Where a Windows package executable was reliably redistributable across versions of Windows, so long as Python version was the same, Linux presented a more difficult target because it was (and is) so much harder to account for all of the little details that must line up.
18+
1719
## The origins of `conda`
1820

19-
In 2012, Continuum Analytics announced Anaconda 0.8 at the SciPy conference [^anaconda-history]. Later that year, in September, Continuum released `conda` 1.0, the cross-platform, language-agnostic package manager for pre-compiled artifacts [^conda-changelog-1.0]. The motivation behind these efforts was to provide an easy way to ship all the compiled libraries and Python packages that users of the SciPy and NumPy stacks needed [^packaging-and-deployment-with-conda]<sup>,</sup>[^lex-fridman-podcast].
21+
In 2012, Continuum Analytics announced Anaconda 0.8 at the SciPy conference [^anaconda-history]. Anaconda was a distribution of scientifically-oriented packages, but did not yet have tools for managing individual packages. Later that year, in September, Continuum released `conda` 1.0, the cross-platform, language-agnostic package manager for pre-compiled artifacts [^conda-changelog-1.0]. The motivation behind these efforts was to provide an easy way to ship all the compiled libraries and Python packages that users of the SciPy and NumPy stacks needed [^packaging-and-deployment-with-conda]<sup>,</sup>[^lex-fridman-podcast].
2022

2123
Travis Oliphant, on [Why I promote conda](https://technicaldiscovery.blogspot.com/2013/12/why-i-promote-conda.html) (2013):
2224

23-
> [...] at the first PyData meetup at Google HQ, where several of us asked Guido what we can do to fix Python packaging for the NumPy stack. Guido's answer was to "solve the problem ourselves". We at Continuum took him at his word. We looked at dpkg, rpm, pip/virtualenv, brew, nixos, and 0installer, and used our past experience with EPD. We thought hard about the fundamental issues, and created the conda package manager and conda environments.
25+
> [...] at the first PyData meetup at Google HQ, where several of us asked Guido what we can do to fix Python packaging for the NumPy stack. Guido's answer was to "solve the problem ourselves". We at Continuum took him at his word. We looked at dpkg, rpm, pip/virtualenv, brew, nixos, and 0installer, and used our past experience with EPD [Enthought Python Distribution]. We thought hard about the fundamental issues, and created the conda package manager and conda environments.
2426
2527
Conda packages could not only ship pre-compiled Python packages across platforms but were also agnostic enough to ship Python itself, as well as the underlying shared libraries without having to statically vendor them. This was particularly convenient for projects that relied on both compiled dependencies (e.g. C++ or Fortran libraries) and Python "glue code".
2628

27-
By June 2013, conda was using a SAT solver and included the `conda build` subcommand [^new-advances-in-conda]. This is also when the first Miniconda release and Binstar.org [^binstar], the predecessor of the Anaconda.org channels, were announced. This meant that any user could build their software stack as conda packages and redistribute them online at no cost.
29+
By June 2013, conda was using a SAT solver and included the `conda build` tool [^new-advances-in-conda] for community users outside of Continuum to build their own conda packages. This is also when the first Miniconda release and Binstar.org [^binstar], a site for hosting arbitrary user-built conda packages, were announced. Miniconda provided a minimal base environment that users could populate themselves, and Binstar.org gave any user an easy platform for redistributing their packages. All of the conda tools and Binstar/Anaconda.org have been free (as in beer), with some paid options on Binstar/Anaconda.org for more storage.
2830

2931
With `conda build` came along the concept of recipes [^early-conda-build-docs]. The [`ContinuumIO/conda-recipes`](https://github.com/conda-archive/conda-recipes) repository became _the_ central
30-
place where people would contribute their conda recipes. While successful, the recipes varied in quality, and typically only worked on one or two platforms. It was common to find recipes that would no longer build, and you had to tweak it to get it to work.
32+
place where people would contribute their conda recipes. This was separate from Anaconda's package recipes, which were private at this point. While successful, the recipes varied in quality, and typically only worked on one or two platforms. There was no CI for any recipes to help keep them working. It was common to find recipes that would no longer build, and you had to tweak it to get it to work.
3133

3234
In 2015, Binstar.org became Anaconda.org, and in 2017 Continuum Analytics rebranded as Anaconda Inc [^anaconda-rebrand].
3335

3436
## How conda-forge came to be
3537

36-
By 2015, several institutes and groups were using Binstar/Anaconda.org to distribute software packages they used daily: the [Omnia Molecular Dynamics](https://github.com/omnia-md) project started as early as March 2014 [^binstar-omnia], the UK Met Office supported [SciTools project](https://scitools.org.uk/) joined in June 2014 [^binstar-scitools], the [US Integrated Ocean Observing System (IOOS)](http://www.ioos.noaa.gov/) started using it in July 2014 [^binstar-ioos].
38+
By 2015, several institutes and groups were using Binstar/Anaconda.org to distribute software packages they used daily: the [Omnia Molecular Dynamics](https://github.com/omnia-md) project started as early as March 2014 [^binstar-omnia], the UK Met Office supported [SciTools project](https://scitools.org.uk/) joined in June 2014 [^binstar-scitools], the [US Integrated Ocean Observing System (IOOS)](http://www.ioos.noaa.gov/) started using it in July 2014 [^binstar-ioos]. Although each channel was building conda packages, the binary compatibility between channels was unpredictable.
3739

3840
In 2014, Filipe Fernandes ([@ocefpaf](https://github.com/ocefpaf)) and Phil Elson ([@pelson](https://github.com/pelson)) were maintaining the Binstar channels for IOOS and SciTools, respectively. Phil had [implemented CI pipelines](https://github.com/SciTools/conda-recipes-scitools/blob/995fc231967719db0dd6321ba8a502390a2f192c/.travis.yml) and [special tooling](https://github.com/conda-tools/conda-build-all) to build conda packages for SciTools efficiently, and Filipe borrowed it for IOOS. There was also a healthy exchange of recipes between the two groups, often assisted by members of other communities. For example, Christophe Gohlke and David Cournapeau were instrumental in getting Windows builds of the whole SciPy stack to work on AppVeyor.
3941

40-
It was a successful collaborative effort, but it was inefficient since they were working in separate repos, duplicated recipes, etc. Given the success of the `ContinuumIO/conda-recipes` repository, it became clear there was a demand for high quality conda recipes and more efficient collaboration under a single umbrella. On April 11th, 2015, `conda-forge` was registered as a Github organization [^github-api-conda-forge] and an Anaconda.org channel [^binstar-conda-forge].
42+
There was a lot of cross-pollination between projects/channels, but working in separate repos, duplicated recipes, and differing build toolchains. Given the success of the `ContinuumIO/conda-recipes` repository, it became clear there was a demand for high quality conda recipes and more efficient collaboration under a single umbrella. On April 11th, 2015, `conda-forge` was registered as a Github organization [^github-api-conda-forge] and an Anaconda.org channel [^binstar-conda-forge].
43+
44+
## Meanwhile at Continuum
45+
46+
It's a little strange to describe Continuum/Anaconda's history here, but the company history is so deeply intertwined with conda-forge that it is essential for a complete story. During this time, Continuum (especially Ilan Schnell) was developing its own internal recipes for packages. Continuum's Linux toolchain at the time was based on CentOS 5 and GCC 4.8. These details matter, because they effectively set the compatibility bounds of the entire conda package ecosystem. The packages made from these internal recipes were available on the "free" channel, which in turn was part of a metachannel named `defaults`. The `defaults` channel made up the initial channel configuration for the Miniconda and Anaconda installers. Concurrently, Aaron Meurer led the conda and conda-build projects, contributed many recipes to the conda-recipes repository and built many packages on his "asmeurer" binstar.org channel. Aaron left Continuum in late 2015, leaving the community side of the projects in need of new leadership. Continuum hired Kale Franz to fill this role. Kale had huge ambitions for conda, but conda-build was not as much of a priority for him. Michael Sarahan stepped in to maintain Conda-build.
47+
48+
In 2016, Rich Signell at USGS connected Filipe and Phil with Travis Oliphant at Continuum, who assigned Michael Sarahan to be Continuum's representative in Conda-Forge. Ray Donnelly joined the team at Continuum soon afterwards, bringing extensive experience in package managers and toolchains from his involvement in the MSYS2 project. There was a period of time where conda-forge and Continuum worked together closely, with conda-forge relying on Continuum to supply several core libraries. This reliance was partly to lower conda-forge's maintenance burden and reduce duplicate work, but it also helped keep mixtures of conda-forge and `defaults` channel packages working by reducing possibility of divergence. Just as there were binary compatibility issues with mixing packages from among the many Binstar channels, mixing packages from `defaults` with `conda-forge` could be fragile and frustrating.
49+
50+
Around this point in time, GCC 5 arrived with a breaking change in libstdc++. These changes, among other compiler updates, began to make the CentOS 5 toolchain troublesome. Cutting edge packages, such as the nascent TensorFlow project, required cumbersome patching to work with the older toolchain, if they worked at all. There was strong pressure from the community to update the ecosystem (i.e. the toolchain, and implicitly everything built with it). There were two prevailing options. One was Red Hat's devtoolset. This used an older GCC version which statically linked the newer libstdc++ parts into binaries, so that libstdc++ updates were not necessary on end user systems. The other was to build GCC ourselves, and to ship the newer libstdc++ library as a conda package. This was a community decision, and it was split roughly down the middle. In the end, the community decided to take the latter route, for the sake of greater control over updating to the latest toolchains, instead of having to rely on Red Hat. One major advantage of providing our own toolchain was that we could provide the toolchain as a conda package instead of a system dependency, so we could now express toolchain requirements in our recipes and have better control over compiler flags and behavior.
51+
52+
As more and more conflicts with `free` channel packages occurred, conda-forge gradually added more and more of their own core dependency packages to avoid those breakages. At the same time, Continuum was working on two contracts that would prove revolutionary. Samsung wanted to use Conda packages to manage their internal toolchains, and Ray suggested that this was complementary to our own internal needs to update our toolchain. Samsung's contract supported development to conda-build that greatly expanded its ability to support explicit variants of recipes. Intel was working on developing their own Python distribution at the time, which they based on Anaconda and added their accelerated math libraries and patches to. Part of the Intel contract was that Continuum would move all of their internal recipes into public-facing GitHub repositories. Rather than putting another set of repositories (another set of changes to merge) in between internal and external sources, such as conda-forge, Michael and Ray pushed for a design where conda-forge would be the reference source of recipes. Continuum would only carry local changes if they were not able to be incorporated into the conda-forge recipe for social, licensing, or technical reasons. The combination of these conda-forge based recipes and the new toolchain are what made up the `main` channel, which was also part of `defaults`. The `main` channel represented a major step forward in keeping conda-forge and Continuum aligned, which equates to smooth operation and happy users.
53+
54+
<!-- miniforge -->
55+
56+
<!-- autotick bot -->
4157

4258
<!-- to be continued -->
4359

0 commit comments

Comments
 (0)