Skip to content

Start article about the history of conda-forge #2298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

jaimergp
Copy link
Member

@jaimergp jaimergp commented Sep 14, 2024

PR Checklist:

  • note any issues closed by this PR with closing keywords
  • if you are adding a new page under docs/ or community/, you have added it to the sidebar in the corresponding _sidebar.json file
  • put any other relevant information below

Still work in progress, but I wanted to capture the momentum started by Wolf and Filipe's podcast episode.

Tagging some folks for awareness, visibility, and hopefully a review, comments or even contributions if they are feeling generous 🙏 @ocefpaf @jakirkham @pelson @dholth @bryevdv @msarahan @asmeurer @ilanschnell. Feel free to tag others as well if you feel they can add more context into the early days!

🔍 Preview article link 🔍

Copy link

netlify bot commented Sep 14, 2024

Deploy Preview for conda-forge-previews ready!

Name Link
🔨 Latest commit 25cde72
🔍 Latest deploy log https://app.netlify.com/sites/conda-forge-previews/deploys/67fe3e44e422a500082bcbdb
😎 Deploy Preview https://deploy-preview-2298--conda-forge-previews.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 63
Accessibility: 96
Best Practices: 100
SEO: 89
PWA: -
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify site configuration.

@ocefpaf
Copy link
Member

ocefpaf commented Sep 16, 2024

Great article @jaimergp! It felt like it ended with cliffhanger and make me want for more. Are you planning on part 2/∞?

@jaimergp
Copy link
Member Author

Are you planning on part 2/∞?

Yes! This is just the beginning, and not ready for publication yet. Was hoping to gather some interest here and get comments from the "old guard" while I cover the very beginnings. Then I'll need a looot of help with the 2016-2021 period, and after that I think I can recollect a few things.

Even basic bullet items with a rough chronology would help so I can research git histories, archive.org, etc.

@beckermr
Copy link
Member

Here are some big events to track / mention. I don't have all of the details:

  • bot creation
  • great compiler migration
  • adding of special ecosystems (cuda, pypi, etc)
  • growth in terms of packages, downloads, etc.
  • rise of non-anaconda tooling (mamba, boa, rattler)
  • addition of major supporters (azure for ci, gpu ci, etc)

@isuruf
Copy link
Member

isuruf commented Sep 17, 2024

There were several bots

  • Linting bot - conda-forge-webservices
  • First bot that sent pinning updates - I don't remember where it was, but can figure it out.
  • Commenting bot (conda-forge-admin, rerender). Also conda-forge-webservices
  • regro bot that superseded the first bot that sent pinning updates. - regro/cf-scripts

Some other things to note

  • Packaging of compiler runtimes to become defaults independent
  • Cross compilation
  • Overwhelm of CI where CI took a couple of days.
  • azure and its donation

@moorepants
Copy link
Contributor

I remember being in a birds of a feather session at SciPy around 2013 or 2014 where the momentum to make conda forge real seemed to solidify and it was very soon after that conference that it took material form.

@asmeurer
Copy link
Member

The content is accurate as far as I can remember (which doesn't necessarily mean much). I would suggest doing a full checkup for grammar, and in particular, being consistent across the post with tense.

@ocefpaf
Copy link
Member

ocefpaf commented Sep 23, 2024

I remember being in a birds of a feather session at SciPy around 2013 or 2014 where the momentum to make conda forge real seemed to solidify and it was very soon after that conference that it took material form.

It was 2015 that the BoF happened and the soft launch on 2016 if I'm not mistaken.

@hmaarrfk
Copy link
Contributor

I would like it if there were a paragraph that mentions the deep collaborative period between Anaconda's default channel and conda-forge. Where we would often trade recipes collaboration was intricately linked.

I really look back fondly at the times where I was learning alot from msarahan, mingwandroid, jcrist, mrocklin (not sure if he worked for Anaconda at the time).

For me, the availability of Qt, Pillow, and OpenCV on windows/osx/linux were what brought me to Anaconda/conda/conda-forge.

@jaimergp jaimergp added the Docs label Nov 26, 2024
@jjhelmus
Copy link
Contributor

jjhelmus commented Feb 7, 2025

My recollection might be off but I recall a meeting at SciPy 2015 which included @pelson, @ocefpaf, @scopatz, myself and likely others where some initial details of what became conda-forge were discussed.

This timeline aligns with the first commit in conda-forge/staged-recipe from the Fall of 2015 and the history recorded in the talks documentation page

I recall conda-forge being a highlight of SciPy 2016 where Phillip Elson gave a talk on the project. I was involved in the project so my view of the excitement in the conference is biased 😄.

@jjhelmus
Copy link
Contributor

jjhelmus commented Feb 7, 2025

Another possible highlight to include in the history is SciPy 2013 when Travis Oliphant announced binstar.org which became anaconda.org during the Thusday lightning talks

Copy link
Member

@pelson pelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional thoughts. Would be happy to add more if helpful.


In 2012, Continuum Analytics announces Anaconda 0.8 in the SciPy conference [^anaconda-history]. Later that year, in September, Continuum would release `conda` 1.0, the cross-platform, language-agnostic package manager for pre-compiled artifacts [^conda-changelog-1.0]. The motivation behind these efforts was to provide an easy way to ship all the compiled libraries and Python packages that users of the SciPy and numpy stacks needed [^packaging-and-deployment-with-conda] [^lex-fridman-podcast].

In constrast with Python eggs and wheels, conda packages were agnostic enough to ship Python itself, as well as the underlying shared libraries without having to statically vendor them under each Python package. This was particularly convenient for projects that relied on both compiled dependencies (e.g. C++ or Fortran libraries) and Python "glue code".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above - this is jumping the gun. There was no binary shipping mechanism for Python - conda was the first environment manager to do so AFAIK.

Before this, on windows you used to go to Christoph Gohlk's website. On linux, I think you had to build from source.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Howdy @pelson! Long time.

Back around say 2005-2010 I remember it working out OK on platforms where you had strong package manager interfaces at the OS tier — like Debian/Ubuntu based distributions, MacPorts on OS X, or ports on FreeBSD. But of course trying to do so on Windows, or anywhere where you didn't have a well maintained compiler stack trivially installable was still a nightmare, especially if you needed to work with a broad collection of packages. In terms of Python specific package managers which did deal with the scientific Python distribution problem, I believe that Enthought had something even before Canopy, and definitely ActiveState did, but both of these were licensed products which hampered their use, particularly in the academic community.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more context here, hopefully it reads better now?

## How conda-forge came to be

In 2014, Filipe Fernandes ([@ocefpaf](https://github.com/ocefpaf)) and Phil Elson ([@pelson](https://github.com/pelson)) get in touch [^chatting-ocefpaf]. They are maintaining the Binstar channels for IOOS and Scitools, respectively.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think there is a lot Filipe and I can add here 😉

We should give some context of the geospatial world in particular (hi gdal).

I would also get to the point of acknowledging both Christophe Gohlke and David Cournapeau, who especially helped me with the Windows builds of the whole SciPy stack (a topic on which I had no knowledge at all, yet needed to get building in a CI context on appveyor. In those days you had to pick particular compilers for particular Python versions, and this was a bit of a dark art to me).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the acknowledgements, but would love to hear more about the geospatial world if you want to add something!


In constrast with Python eggs and wheels, conda packages were agnostic enough to ship Python itself, as well as the underlying shared libraries without having to statically vendor them under each Python package. This was particularly convenient for projects that relied on both compiled dependencies (e.g. C++ or Fortran libraries) and Python "glue code".

By June 2013, conda is using a SAT solver and includes the `conda build` subcommand [^new-advances-in-conda], along with the concept of recipes [^conda-recipes-repo] [^early-conda-build-docs]. This is also when the first Miniconda release is announced. By the end of the year, Continuum Analytics announces Binstar.org, the predecessor of the Anaconda.org channels. This meant that now any user could build their software stack as conda packages and redistribute them online at no cost.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think conda-recipes is worth more detail - it was the place that people would contribute their recipes.

It was really successful, but the recipes were of various quality, and typically only worked on one or two platforms. There was a high chance that a recipe you found there would no longer build, and you had to tweak it to get it to work.

To my knowledge, SciTools was the first repo to do CI based builds. Filipe borrowed the technical infra for IOOS to do the same. I borrowed some of the harder to build recipes back. It was a successful collaborative effort, but it was inefficient since we were working in separate repos. We often had duplicate recipes etc.

conda-forge came about because I could see that the conda-recipes repo was popular, there was demand for having high-quality recipes, and we needed a way to build them in a consistent way. I even built a tool (conda-build-all) to try to do this from our repositories in an efficient way. In the end, it got to the point where we wanted to de-centralise the responsibilities, and the one-repo-per-recipe concept fell out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If @pelson is OK I think we should write this part in the blog in some form:

It was really successful, but the recipes were of various quality, and typically only worked on one or two platforms. There was a high chance that a recipe you found there would no longer build, and you had to tweak it to get it to work.

To my knowledge, SciTools was the first repo to do CI based builds. Filipe borrowed the technical infra for IOOS to do the same. I borrowed some of the harder to build recipes back. It was a successful collaborative effort, but it was inefficient since we were working in separate repos. We often had duplicate recipes etc.

conda-forge came about because I could see that the conda-recipes repo was popular, there was demand for having high-quality recipes, and we needed a way to build them in a consistent way. I even built a tool (conda-build-all) to try to do this from our repositories in an efficient way. In the end, it got to the point where we wanted to de-centralise the responsibilities, and the one-repo-per-recipe concept fell out.

What do you think @jaimergp?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added almost everything. I need more details on the monorepo -> recipe-per-repo transition. Do I remember correctly that conda-forge was initially set up as a monorepo?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing in that story that I'm curious about, and have a hazy understanding of at best, is the relationship to anaconda's free channel. It's possible that it's completely unrelated, but I have the vague recollection that conda-forge grew out of (or at least into) the same space as the free channel

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can explain this, but I am afk until Sunday.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added almost everything. I need more details on the monorepo -> recipe-per-repo transition. Do I remember correctly that conda-forge was initially set up as a monorepo?

Nope. The "proto" conda-forge, sscitools and ioos channels were, but conda-forge started as a distributed repos due to the monorepo limitations we had.

@msarahan
Copy link
Member

Added some Continuum/Anaconda context jaimergp#1

@jaimergp
Copy link
Member Author

Thank you so much @msarahan, that adds so much context!

@msarahan msarahan force-pushed the history-conda-forge branch from 8a9e5b9 to 9930e5b Compare April 14, 2025 16:22
Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the extra context @msarahan, it's great to have some long-standing blanks in my understanding filled in :)


Here around 2017, Continuum renamed itself to Anaconda, so let's switch those names from here out.

As more and more conflicts with `free` channel packages occurred, conda-forge gradually added more and more of their own core dependency packages to avoid those breakages. At the same time, Anaconda was working on two contracts that would prove revolutionary. Samsung wanted to use conda packages to manage their internal toolchains, and Ray suggested that this was complementary to our own internal needs to update our toolchain. Samsung's contract supported development to conda-build that greatly expanded its ability to support explicit variants of recipes. Intel was working on developing their own Python distribution at the time, which they based on Anaconda and added their accelerated math libraries and patches to. Part of the Intel contract was that Anaconda would move all of their internal recipes into public-facing GitHub repositories. Rather than putting another set of repositories (another set of changes to merge) in between internal and external sources, such as conda-forge, Michael and Ray pushed for a design where conda-forge would be the reference source of recipes. Anaconda would only carry local changes if they were not able to be incorporated into the conda-forge recipe for social, licensing, or technical reasons. The combination of these conda-forge based recipes and the new toolchain are what made up the `main` channel, which was also part of `defaults`. The `main` channel represented a major step forward in keeping conda-forge and Anaconda aligned, which equates to smooth operation and happy users. The joined recipe base and toolchain has sometimes been contentious, with conda-forge wanting to move faster than Anaconda or vice-versa. The end result has been a compromise between cutting-edge development and slower enterprise-focused development.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIR, the compiler stack between defaults and conda-forge wasn't actually shared until GCC 7 I think? I realize though that that was comparatively smaller work&impact than the decision to start building the compilers ourselves (kudos in retrospect 👏)

(also, could we break these really long lines a bit?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCC 7 is what the first compiler-in-a-package was. Ray knew that we had to keep on the cutting edge if we were going to ship our own libstdc++. If load order led to an older, conda-based libstdc++, then some later library being loaded may not find the (system) symbols that it needs. In practice, the compiler stack hasn't been kept on the cutting edge consistently. It hasn't been a huge issue, since conda packages shouldn't be reaching out to the system libraries, but the risk is there.

Copy link
Member

@h-vetinari h-vetinari Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, the compiler stack hasn't been kept on the cutting edge consistently.

It has in recent years 😊

(we intentionally keep the default compiler version in the pinning roughly 9-15 months behind their respective newest releases, but new compiler versions are published on the order of weeks after release for GCC, and less than that for clang; in particular, since the major runtime libraries are only constrained to satisfy >= compiler_version, this means our libgcc/libstdcxx/libcxx etc. stay on the cutting edge consistently.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, and I have been very happy to see that. I imagine that it might have caused some friction with Anaconda if they have trouble keeping up, but Ray would be thrilled to see how well the toolchains are maintained these days.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCC 7 is what the first compiler-in-a-package was.

Hm, I can find packages for 5.4. Is is then true that GCC 7 was the first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly have no idea what that GCC5 package is from. Too many beers between then and now.

What I can say is that the timestamps on repo.anaconda.com indicate that the GCC 7 packages were available a few months before the single GCC 5 one.

Screenshot 2025-04-14 at 10 17 08 PM

Ray was absolutely insistent that GCC 7 (newest at the time we started the work) be our toolchain.

@msarahan
Copy link
Member

This is a really wonderful collaboration and trip down memory lane! I have one more PR with line breaks and a couple of other links at jaimergp#2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.