add blog/2020-07-02-op-risk.md

jaimergp · CJ-Wright · jaimergp · commit b24c975c6032 · 2023-11-04T13:23:34.000+01:00
Co-authored-by: cj-wright &lt;cj-wright@users.noreply.github.com&gt;
diff --git a/blog/2020-07-02-op-risk.md b/blog/2020-07-02-op-risk.md
@@ -0,0 +1,115 @@
+---
+authors:
+  - cj-wright
+tags: [conda-forge]
+---
+
+# Conda-Forge Operational Risk
+
+Recently I've been thinking about operational risk (op. risk).
+Operational risks arise from failures of processes, for instance a
+missing email, or an automated software system not running properly.
+Many commercial institutions are interested in minimizing op. risk,
+since it is risk that produces no value, as opposed to risks associated
+with investing. This is also something I think about in my job at
+[Lab49](https://www.lab49.com/), where I'm a software engineering
+consultant focusing on financial institutions. I think there is also a
+good analogy for Conda-Forge, even though we are not a commercial
+outfit. In this case the risk we incur isn't the potential for lost
+earnings but frustration for our users and maintainers in the form of
+bugs and lackluster user experience. In this post I explore three main
+sources of operational risk for Conda-Forge: Automation, Top-Down
+Control, and Self-Service Structure.
+
+<!--truncate-->
+
+## A brief conda-forge primer
+
+Conda-Forge is an ecosystem and community that grew around building
+packages for the conda package manager. Conda-Forge uses continuous
+integration services to build packages from GitHub repos called
+feedstocks. This structure enables teams of contributors to maintain
+packages via a pull request based workflow. At time of writing
+Conda-Forge has over 10000 feedstocks and ships more than 120 million
+packages a month.
+
+## Self-Service Structure
+
+Conda-Forge is built around a self-service structure for each stage in a
+feedstock's lifecyle. The creation of new feedstocks relies on would be
+maintainers to submit PRs to staged-recipes. Although language specific
+help teams and staged-recipes reviewers provide some assistance and
+oversight, the PR submitter plays the most important role in proposing
+the package and shepherding it to acceptance. Once the feedstock is
+accepted the maintenance is federated with most upkeep being performed
+by the maintainers, who have extensive permissions and control over the
+feedstock. If fixes or updates are needed for a package, maintainers and
+users are encouraged to open their own pull requests.
+
+This structure can present a few challenges for minimizing op. risk. The
+most important challenge is the disconnect between feedstock maintainers
+and users. While most maintainers are package users, most of our users
+are not maintainers, and are unlikely to become maintainers. The
+disparity between maintainers and users can come from a few sources,
+some under our control and others not. For instance we can write better
+documentation, lowering the barrier to entry, but we don't have control
+over how our user's incentive structures value Conda-Forge
+contributions. This produces a gap in representation in the Conda-Forge
+organizational structure, where non-maintainer users' issues and
+desires are not communicated to maintainers and Core.
+
+For instance, are we servicing the needs of developers using our
+binaries as dependencies to code they are compiling locally. As another
+example, are there support gaps for developers and scientists using
+Conda-Forge in academic and government laboratories, who might not have
+the skills or capacity to fix feedstocks. Our reliance on the public
+GitHub platform may prevent some users without access from raising their
+concerns. Since these users may be under-represented we don't even know
+if we are meeting their needs and how best to help.
+
+## Top-Down Control
+
+While the majority of Conda-Forge's permissions structure is federated,
+certain important parts are centralized, with the Core developers making
+key decisions. Often these decisions are focused on stability of the
+ecosystem, for instance what versions of languages to support.
+Additionally, maintenance and enhancements to the Conda-Forge
+infrastructure are mostly performed by Core developers.
+
+However, the Core developers are usually experienced feedstock
+maintainers, expert conda users, and have bought into the Conda-Forge
+ecosystem and mission. This means that decisions can be made without the
+perspective of new users or maintainers, or from potential users that
+are skeptical of the Conda-Forge approach.
+
+For instance, decisions about application binary interface pins are
+usually made by core, although these changes have impacts on downstream
+maintainers. It is possible that most maintainers don't know about what
+these pins are, how they are changed and how that affects their
+feedstocks.
+
+## Automation
+
+Automation has been used to great effect to make Conda-Forge possible.
+The various bots and web services enable Conda-Forge's current scale,
+providing help and support from running builds, bumping versions, and
+checking feedstock quality. However, this automation presents its own
+operational risks and magnifies existing operational risks.
+
+Automation has a tendency to fail when we least expect it and often we
+lack the ability to fix it. The January 2018 Travis-CI outage is a great
+example of this, where the CI service we were using for macOS builds
+experienced reduced capacity and then a complete outage, causing builds
+to queue for days. Recently there was a sudden decrease in the number of
+parallel builds on Azure causing a similar queue of builds. Automation
+can cause issues by enabling users to make decisions without all the
+needed information. While many feedstocks have effective smoke tests for
+their packages the autotick bot doesn't currently check for new
+dependencies, potentially leading to missing or incorrect package
+metadata.
+
+## Conclusion
+
+Overall Conda-Forge has managed its operational risk well. Most
+importantly Conda-Forge's transparent open source nature allows us to
+address these issues head on by engaging with the community.