-
-
Notifications
You must be signed in to change notification settings - Fork 299
Start article about the history of conda-forge #2298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for conda-forge-previews ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Great article @jaimergp! It felt like it ended with cliffhanger and make me want for more. Are you planning on part 2/∞? |
Yes! This is just the beginning, and not ready for publication yet. Was hoping to gather some interest here and get comments from the "old guard" while I cover the very beginnings. Then I'll need a looot of help with the 2016-2021 period, and after that I think I can recollect a few things. Even basic bullet items with a rough chronology would help so I can research git histories, archive.org, etc. |
Here are some big events to track / mention. I don't have all of the details:
|
There were several bots
Some other things to note
|
I remember being in a birds of a feather session at SciPy around 2013 or 2014 where the momentum to make conda forge real seemed to solidify and it was very soon after that conference that it took material form. |
The content is accurate as far as I can remember (which doesn't necessarily mean much). I would suggest doing a full checkup for grammar, and in particular, being consistent across the post with tense. |
It was 2015 that the BoF happened and the soft launch on 2016 if I'm not mistaken. |
I would like it if there were a paragraph that mentions the deep collaborative period between Anaconda's default channel and conda-forge. Where we would often trade recipes collaboration was intricately linked. I really look back fondly at the times where I was learning alot from msarahan, mingwandroid, jcrist, mrocklin (not sure if he worked for Anaconda at the time). For me, the availability of Qt, Pillow, and OpenCV on windows/osx/linux were what brought me to Anaconda/conda/conda-forge. |
My recollection might be off but I recall a meeting at SciPy 2015 which included @pelson, @ocefpaf, @scopatz, myself and likely others where some initial details of what became This timeline aligns with the first commit in conda-forge/staged-recipe from the Fall of 2015 and the history recorded in the talks documentation page I recall |
Another possible highlight to include in the history is SciPy 2013 when Travis Oliphant announced binstar.org which became anaconda.org during the Thusday lightning talks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional thoughts. Would be happy to add more if helpful.
community/history.md
Outdated
|
||
In 2012, Continuum Analytics announces Anaconda 0.8 in the SciPy conference [^anaconda-history]. Later that year, in September, Continuum would release `conda` 1.0, the cross-platform, language-agnostic package manager for pre-compiled artifacts [^conda-changelog-1.0]. The motivation behind these efforts was to provide an easy way to ship all the compiled libraries and Python packages that users of the SciPy and numpy stacks needed [^packaging-and-deployment-with-conda] [^lex-fridman-podcast]. | ||
|
||
In constrast with Python eggs and wheels, conda packages were agnostic enough to ship Python itself, as well as the underlying shared libraries without having to statically vendor them under each Python package. This was particularly convenient for projects that relied on both compiled dependencies (e.g. C++ or Fortran libraries) and Python "glue code". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above - this is jumping the gun. There was no binary shipping mechanism for Python - conda
was the first environment manager to do so AFAIK.
Before this, on windows you used to go to Christoph Gohlk's website. On linux, I think you had to build from source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Howdy @pelson! Long time.
Back around say 2005-2010 I remember it working out OK on platforms where you had strong package manager interfaces at the OS tier — like Debian/Ubuntu based distributions, MacPorts on OS X, or ports on FreeBSD. But of course trying to do so on Windows, or anywhere where you didn't have a well maintained compiler stack trivially installable was still a nightmare, especially if you needed to work with a broad collection of packages. In terms of Python specific package managers which did deal with the scientific Python distribution problem, I believe that Enthought had something even before Canopy, and definitely ActiveState did, but both of these were licensed products which hampered their use, particularly in the academic community.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some more context here, hopefully it reads better now?
## How conda-forge came to be | ||
|
||
In 2014, Filipe Fernandes ([@ocefpaf](https://github.com/ocefpaf)) and Phil Elson ([@pelson](https://github.com/pelson)) get in touch [^chatting-ocefpaf]. They are maintaining the Binstar channels for IOOS and Scitools, respectively. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think there is a lot Filipe and I can add here 😉
We should give some context of the geospatial world in particular (hi gdal
).
I would also get to the point of acknowledging both Christophe Gohlke and David Cournapeau, who especially helped me with the Windows builds of the whole SciPy stack (a topic on which I had no knowledge at all, yet needed to get building in a CI context on appveyor. In those days you had to pick particular compilers for particular Python versions, and this was a bit of a dark art to me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the acknowledgements, but would love to hear more about the geospatial world if you want to add something!
community/history.md
Outdated
|
||
In constrast with Python eggs and wheels, conda packages were agnostic enough to ship Python itself, as well as the underlying shared libraries without having to statically vendor them under each Python package. This was particularly convenient for projects that relied on both compiled dependencies (e.g. C++ or Fortran libraries) and Python "glue code". | ||
|
||
By June 2013, conda is using a SAT solver and includes the `conda build` subcommand [^new-advances-in-conda], along with the concept of recipes [^conda-recipes-repo] [^early-conda-build-docs]. This is also when the first Miniconda release is announced. By the end of the year, Continuum Analytics announces Binstar.org, the predecessor of the Anaconda.org channels. This meant that now any user could build their software stack as conda packages and redistribute them online at no cost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think conda-recipes
is worth more detail - it was the place that people would contribute their recipes.
It was really successful, but the recipes were of various quality, and typically only worked on one or two platforms. There was a high chance that a recipe you found there would no longer build, and you had to tweak it to get it to work.
To my knowledge, SciTools was the first repo to do CI based builds. Filipe borrowed the technical infra for IOOS to do the same. I borrowed some of the harder to build recipes back. It was a successful collaborative effort, but it was inefficient since we were working in separate repos. We often had duplicate recipes etc.
conda-forge came about because I could see that the conda-recipes repo was popular, there was demand for having high-quality recipes, and we needed a way to build them in a consistent way. I even built a tool (conda-build-all
) to try to do this from our repositories in an efficient way. In the end, it got to the point where we wanted to de-centralise the responsibilities, and the one-repo-per-recipe concept fell out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If @pelson is OK I think we should write this part in the blog in some form:
It was really successful, but the recipes were of various quality, and typically only worked on one or two platforms. There was a high chance that a recipe you found there would no longer build, and you had to tweak it to get it to work.
To my knowledge, SciTools was the first repo to do CI based builds. Filipe borrowed the technical infra for IOOS to do the same. I borrowed some of the harder to build recipes back. It was a successful collaborative effort, but it was inefficient since we were working in separate repos. We often had duplicate recipes etc.
conda-forge came about because I could see that the conda-recipes repo was popular, there was demand for having high-quality recipes, and we needed a way to build them in a consistent way. I even built a tool (conda-build-all) to try to do this from our repositories in an efficient way. In the end, it got to the point where we wanted to de-centralise the responsibilities, and the one-repo-per-recipe concept fell out.
What do you think @jaimergp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added almost everything. I need more details on the monorepo -> recipe-per-repo transition. Do I remember correctly that conda-forge was initially set up as a monorepo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing in that story that I'm curious about, and have a hazy understanding of at best, is the relationship to anaconda's free
channel. It's possible that it's completely unrelated, but I have the vague recollection that conda-forge grew out of (or at least into) the same space as the free
channel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can explain this, but I am afk until Sunday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added almost everything. I need more details on the monorepo -> recipe-per-repo transition. Do I remember correctly that conda-forge was initially set up as a monorepo?
Nope. The "proto" conda-forge, sscitools and ioos channels were, but conda-forge started as a distributed repos due to the monorepo limitations we had.
…nto history-conda-forge
Co-authored-by: h-vetinari <[email protected]> Co-authored-by: pelson <[email protected]> Co-authored-by: Jason K. Moore <[email protected]>
Co-authored-by: Jonathan Helmus <[email protected]>
Co-authored-by: pelson <[email protected]>
Co-authored-by: h-vetinari <[email protected]>
Co-authored-by: Jonathan J. Helmus <[email protected]>
Co-authored-by: pelson <[email protected]>
Co-authored-by: pelson <[email protected]>
Added some Continuum/Anaconda context jaimergp#1 |
Thank you so much @msarahan, that adds so much context! |
8a9e5b9
to
9930e5b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the extra context @msarahan, it's great to have some long-standing blanks in my understanding filled in :)
community/history.md
Outdated
|
||
Here around 2017, Continuum renamed itself to Anaconda, so let's switch those names from here out. | ||
|
||
As more and more conflicts with `free` channel packages occurred, conda-forge gradually added more and more of their own core dependency packages to avoid those breakages. At the same time, Anaconda was working on two contracts that would prove revolutionary. Samsung wanted to use conda packages to manage their internal toolchains, and Ray suggested that this was complementary to our own internal needs to update our toolchain. Samsung's contract supported development to conda-build that greatly expanded its ability to support explicit variants of recipes. Intel was working on developing their own Python distribution at the time, which they based on Anaconda and added their accelerated math libraries and patches to. Part of the Intel contract was that Anaconda would move all of their internal recipes into public-facing GitHub repositories. Rather than putting another set of repositories (another set of changes to merge) in between internal and external sources, such as conda-forge, Michael and Ray pushed for a design where conda-forge would be the reference source of recipes. Anaconda would only carry local changes if they were not able to be incorporated into the conda-forge recipe for social, licensing, or technical reasons. The combination of these conda-forge based recipes and the new toolchain are what made up the `main` channel, which was also part of `defaults`. The `main` channel represented a major step forward in keeping conda-forge and Anaconda aligned, which equates to smooth operation and happy users. The joined recipe base and toolchain has sometimes been contentious, with conda-forge wanting to move faster than Anaconda or vice-versa. The end result has been a compromise between cutting-edge development and slower enterprise-focused development. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIR, the compiler stack between defaults and conda-forge wasn't actually shared until GCC 7 I think? I realize though that that was comparatively smaller work&impact than the decision to start building the compilers ourselves (kudos in retrospect 👏)
(also, could we break these really long lines a bit?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCC 7 is what the first compiler-in-a-package was. Ray knew that we had to keep on the cutting edge if we were going to ship our own libstdc++. If load order led to an older, conda-based libstdc++, then some later library being loaded may not find the (system) symbols that it needs. In practice, the compiler stack hasn't been kept on the cutting edge consistently. It hasn't been a huge issue, since conda packages shouldn't be reaching out to the system libraries, but the risk is there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, the compiler stack hasn't been kept on the cutting edge consistently.
It has in recent years 😊
(we intentionally keep the default compiler version in the pinning roughly 9-15 months behind their respective newest releases, but new compiler versions are published on the order of weeks after release for GCC, and less than that for clang; in particular, since the major runtime libraries are only constrained to satisfy >= compiler_version, this means our libgcc/libstdcxx/libcxx etc. stay on the cutting edge consistently.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, and I have been very happy to see that. I imagine that it might have caused some friction with Anaconda if they have trouble keeping up, but Ray would be thrilled to see how well the toolchains are maintained these days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCC 7 is what the first compiler-in-a-package was.
Hm, I can find packages for 5.4. Is is then true that GCC 7 was the first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly have no idea what that GCC5 package is from. Too many beers between then and now.
What I can say is that the timestamps on repo.anaconda.com indicate that the GCC 7 packages were available a few months before the single GCC 5 one.

Ray was absolutely insistent that GCC 7 (newest at the time we started the work) be our toolchain.
Co-authored-by: James A. Bednar <[email protected]>
Co-authored-by: h-vetinari <[email protected]>
This is a really wonderful collaboration and trip down memory lane! I have one more PR with line breaks and a couple of other links at jaimergp#2 |
PR Checklist:
docs/
orcommunity/
, you have added it to the sidebar in the corresponding_sidebar.json
fileStill work in progress, but I wanted to capture the momentum started by Wolf and Filipe's podcast episode.
Tagging some folks for awareness, visibility, and hopefully a review, comments or even contributions if they are feeling generous 🙏 @ocefpaf @jakirkham @pelson @dholth @bryevdv @msarahan @asmeurer @ilanschnell. Feel free to tag others as well if you feel they can add more context into the early days!
🔍 Preview article link 🔍