thesis/floss.tex at master · RichardLitt/thesis · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
% !TEX root = thesis.tex
\section{Open Source Code}
\label{sec:open-source}

{\it Open Source} is a complex concept which can refer to any code that is permissively licensed, not just code related to computational linguistics. Here, I will define what I mean by Open Source. This will largely inform the next section where I talk about its use for low resource languages.

\subsection{Defining {\it open source}}
\label{subsec:defining-open-source}

At its core, {\it open source} refers to code which has a license which allows it to be freely inspected, used, or modified by anyone, without restriction. The concept was introduced in 1998 by Linux programmers such as Eric Raymond, author of {\it The Cathedral and the Bazaar}\footnote{\href{http://www.catb.org/~esr/writings/cathedral-bazaar/}{http://www.catb.org/~esr/writings/cathedral-bazaar/}. \last{May~2}} \citep{raymond1999cathedral}; Linus Torvalds, author of the Linux kernel\footnote{\href{https://www.kernel.org/}{https://www.kernel.org/}. \last{May~2}} and Git\footnote{\href{https://git-scm.com/}{https://git-scm.com/}. \last{May~2}}; Richard Stallman, founder of the GNU project\footnote{\href{https://www.gnu.org/}{https://www.gnu.org/}. \last{May~2}} and the Free Software Foundation\footnote{\href{https://www.fsf.org/}{https://www.fsf.org/}. \last{May~2}}; and others in response to the Netscape browser's code being openly licensed and made available.

{\it Open source} is one of many terms which can be used to differentiate code which is either available or licensed permissively for re-use; other terms include {\it free} and {\it libre} software. There is no standard definition of open source that is universally accepted.

Nor will universal acceptance be forthcoming. The issue regarding reconciliation between open source, free software, and the rest of the terms stems largely from a difference of opinion between what constitutes open software, and what `free' and `open' mean. An oft-used expression is ``free as in beer'' as opposed to ``free as in speech'', where the first is used for gratis software which has no monetary price set on it, and the second is used to refer to software which is written without restriction. The term {\it libre} is most often used for this second definition, to differentiate the two meanings in English. Occasionally, the acronym FLOSS is used in open source parlance to refer to Free Libre Open Source Software, which is both gratis and libre software.

For some adherents, software ought to be free (gratis), as it is a result of human labour and because opening it up without cost maximises the potential usefulness of that code, and minimises duplicated effort. This idea harks back to the idea of a digital commons: like the commons in philosophical and economic literature (cf. \citepos{hardin2009tragedy} seminal article on the subject), code can be viewed as a resource that belongs to humanity as a whole, and not the creators who initially fashioned it. In this sense, open source is a more of a philosophical theme than a technical term.

\begin{quote}
Open source is a development methodology; free software is a social movement. For the free software movement, free software is an ethical imperative, essential respect for the users' freedom. By contrast, the philosophy of open source considers issues in terms of how to make software ``better'' - in a practical sense only. It says that nonfree software is an inferior solution to the practical problem at hand.\footnote{\href{https://www.gnu.org/philosophy/open-source-misses-the-point.html}{https://www.gnu.org/philosophy/open-source-misses-the-point.html}. \last{May~2}}
\signed Richard Stallman (Founder of GNU\/Linux)
\end{quote}

However, for the most part, open source is not disambiguated as a term, because authority for this task is delegated to the license put on a piece of software, which determines the legality and potential use. Licenses determine the legal rights to sharing code. A piece of code which is taken from a proprietary server and published on the internet is not necessarily open source. In this instance, the code may have been illegally copied and shared, but it is not licensed for free usage. Under no definitions is this considered open source. Indeed, this touches upon issues of digital copytheft and piracy, which is a standard term used frequently in the media and in legal proceedings to attach a sense that copying code is the same as larceny or theft on the high seas. Avoiding the question of the validity of this viewpoint, it is important to focus on the license as the differentiating factor between code which has been released legally under an open definition or not. The term open source under most definitions does not pertain to ethical concerns about the software's usage, but rather simply refers to whether or not it is permissively licensed and available for users.

There are many licenses which are considered to be open source, and there are several arbiters available which judge the validity of open source licensing. The Open Source Initiative (OSI) maintains a list of approved licenses on their website.\footnote{\href{https://opensource.org/licenses}{https://opensource.org/licenses}. \last{May~2}}

The OSI, whose founders were one of the original coiners of the term {\it open source}, has several parameters by which open source software can be judged as being `open' or `closed' (that is, proprietary, non-permissively licensed, non-reusable, limited in usage to a set amount of people, and so on). It may be useful to list these terms directly below, as they are instructive about how open source can be a nuanced term. These terms and their definitions are from the OSI's website,\footnote{\href{https://opensource.org/osd}{https://opensource.org/osd}. \last{May~2}} and are repeated below verbatim.

\begin{enumerate}
\item{Free Redistribution}. The license shall not restrict any party from selling or giving away the software as a component of an aggregate software distribution containing programs from several different sources. The license shall not require a royalty or other fee for such sale.
\item{Source Code}. The program must include source code, and must allow distribution in source code as well as compiled form. Where some form of a product is not distributed with source code, there must be a well-publicized means of obtaining the source code for no more than a reasonable reproduction cost, preferably downloading via the Internet without charge. The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed.
\item{Derived Works}. The license must allow modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software.
\item{Integrity of The Author's Source Code}.
  The license may restrict source-code from being distributed in modified form only if the license allows the distribution of ``patch files'' with the source code for the purpose of modifying the program at build time. The license must explicitly permit distribution of software built from modified source code. The license may require derived works to carry a different name or version number from the original software.
\item{No Discrimination Against Persons or Groups}.
  The license must not discriminate against any person or group of persons.
\item{No Discrimination Against Fields of Endeavor}.
  The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research.
\item{Distribution of License}.
  The rights attached to the program must apply to all to whom the program is redistributed without the need for execution of an additional license by those parties.
\item{License Must Not Be Specific to a Product}.
  The rights attached to the program must not depend on the program's being part of a particular software distribution. If the program is extracted from that distribution and used or distributed within the terms of the program's license, all parties to whom the program is redistributed should have the same rights as those that are granted in conjunction with the original software distribution.
\item{License Must Not Restrict Other Software}.
  The license must not place restrictions on other software that is distributed along with the licensed software. For example, the license must not insist that all other programs distributed on the same medium must be open-source software.
\item{License Must Be Technology-Neutral}.
  No provision of the license may be predicated on any individual technology or style of interface.
\end{enumerate}

\subsection{Open source licenses}
\label{subsec:licenses}

The different terms and conditions listed above are often conflated, and a legally-valid license which satisfies all of them is difficult to write on an {\it ad hoc} basis. For this reason most open source programming relies on using existing licenses, and copying them for specific projects. There are tools today to help make licensing more clear to na\"ive users, such as \href{https://choosealicense.com}{choosealicense.com}, \href{https://tldrlegal.com}{tldrlegal.com}, and so on.

Some of the main licenses used in the wild are as follows:

\begin{itemize}
\item The X11 license, developed at MIT and more commonly called the MIT license,\footnote{\href{https://www.gnu.org/licenses/license-list.en.html}{https://www.gnu.org/licenses/license-list.en.html}. \last{May~3}} is the most popular license on Git\-Hub,\footnote{\href{https://github.com}{https://github.com}. \last{May~2}} the world's largest repository of code. It is used in over 40\% of the projects licensed there as of March 2015\footnote{\href{https://blog.github.com/2015-03-09-open-source-license-usage-on-github-com/}{https://blog.github.com/2015-03-09-open-source-license-usage-on-github-com/}. \last{May~2}}, and in almost a million projects indexed by the package indexer at \href{https://libraries.io}{\nolinkurl{libraries.io}}.\footnote{\href{https://libraries.io/licenses}{https://libraries.io/licenses}. \last{May~3}} It is a very permissive license, which allows commercial use, modification, distribution, sublicensing, and private use of any code so licensed. It also waives liability for the authors of the code, saving them from needing to worry about lawsuits in cases where their code would otherwise be liable - the code is granted as is, and what the user does with it is not the author's fault. The only restriction is that you need to include the license in any software which uses it.
\item The Apache License 2.0, developed by the Apache Software Foundation,\footnote{\href{https://www.apache.org/licenses/}{https://www.apache.org/licenses/}. \last{May~2}} is similar, but disallows users from trademarking code with the license, requires a few smaller modifications like stating code changes and adding a NOTICE file, if one exists, to derivational code, and also adds a patents clause for contributors.
\item The BSD licenses were developed for use with Berkeley Software Distribution, a Unix-like OS. There have been multiple iterations; the first, 4-clause license required every subsequent license to reference and acknowledge the original, ending with large lists of acknowledgements; a subsequent 3-clause license (often called the ``New'' BSD) removed this, but kept a clause which stated that usage does not imply endorsement by the original contributors; and this was removed in a 2-clause version, often called ``Simplified'' or the ``FreeBSD'' license.
\item The GNU General Public License (GPL)\footnote{\href{https://www.gnu.org/licenses/}{https://www.gnu.org/licenses/}. \last{May~2}} is the main example of copyleft licensing, where any derivative works that use GPL licensed code must also use a GPL license. This causes major issues when users want to combine code from multiple sources, some of whose licenses may conflict. For this reason, the GNU Library or ``Lesser'' General Public License (LGPL) was created, to allow only code under the LGPL to be accessible and modifiable openly, while all other code does not have to be. GPL also demands that users include installation instructions,
\item Creative Commons licenses,\footnote{\href{https://creativecommons.org/licenses/}{https://creativecommons.org/licenses/}. \last{May~2}} mostly used for sharing non-code material such as images and documents openly, was created by Lawrence Lessig, the founder of the Creative Commons organisation,\footnote{\href{https://creativecommons.org/}{https://creativecommons.org/}. \last{May~2}} and may also be used for code projects. Some Creative Commons licenses are copyleft licenses - in particular, ``share-alike'' clauses are an example of copyleft.
\item The Unlicense,\footnote{\href{https://unlicense.org/}{https://unlicense.org/}. \last{May~2}} created in 2010, is another option, which explicitly states that code is unlicensed, with no restrictions, and also with no liability for the authors (unlike code which is not licensed, which has stricter protections under US copyright law than code which specifically excludes a license). There is a Creative Commons Zero,\footnote{\href{https://creativecommons.org/publicdomain/zero/1.0/}{https://creativecommons.org/publicdomain/zero/1.0/}. \last{May~2}} license which is similar, as well as the WTFPL license (``Do What The Fuck You Want Public License'')\footnote{\href{http://www.wtfpl.net}{http://www.wtfpl.net}. \last{May~2}} which, although intentionally comically profane, is non-trivial in that it is used in 11,714 different software projects on GitHub as of this writing.\footnote{\href{https://github.com/search?q=license\%3AWTFPL}{https://github.com/search?q=license\%3AWTFPL}. \last{May~2}}
\end{itemize}

As is clear from these short descriptions, licenses are not easily interchangeable and they come with a range of suppositions about how the data ought to be used. Copyleft licenses (mostly GPL) require any derivative works to also be open source, which means that they cannot be used in proprietary codebases, leading to fragmentation of the code space and to legality issues in the long run. However, the effects of copyleft may be more perfidious, in that funders or developers may avoid projects altogether if they find a project has (or does not have) a copyleft license. The same could be said for liability waivers, or more especially the lack thereof. This is backed up in studies: for instance, two thirds of respondents for GitHub's open source survey in 2017 said that they value licensing as a major factor when contributing to a project.\footnote{\href{http://opensourcesurvey.org/2017/}{http://opensourcesurvey.org/2017/}. \last{May~2}} Ultimately, licenses are complicated legal documents with various repercussions for how code is accessible.

\subsection{Where is open source code?}
\label{subsec:where-is-open-source-code}

For closed source or proprietary software, the code itself often is not stored in the open or accessible to third parties. However, for open source software to be defined as open source according to OSI's definitions, it needs to be publicly accessible and well-publicised. This means that storing code on a server where it could technically be accessed via some protocol, or less ideally through a mail-order CD as \citet{krauwer2006strengthening} suggested, is not enough; instead, it ought to be linked to elsewhere and available for everyone to access. This raises the question: where is most open source code stored?

Unequivocally, GitHub\footnote{\href{https://github.com}{https://github.com}. \last{May~2}} is the largest source of shared, open code on the internet, with 27 million users and 80 million repositories\footnote{Not all of the projects  included in these numbers are public.} as of March 2018.\footnote{\href{https://github.com/about}{https://github.com/about}. \last{May~2}} There have been several large-scale studies of its codebase by researchers \citep{gousios2012ghtorrent, allamanis2013mining, gousios2014lean, kalliamvakou2014promises, beller2016analyzing} which confirm this. Other large repositories for code of a similar nature, include Sourceforge, with 430k projects and 3.7m users,\footnote{\href{https://sourceforge.net/}{https://sourceforge.net}. \last{April~18}} Bitbucket\footnote{\href{https://bitbucket.org/}{https://bitbucket.org/}. \last{May~2}} with 5m users,\footnote{\href{https://blog.bitbucket.org/2016/09/07/bitbucket-cloud-5-million-developers-900000-teams/}{https://blog.bitbucket.org/2016/09/07/bitbucket-cloud-5-million-developers-900000-teams/}. \last{May~2}} Launchpad\footnote{\href{https://launchpad.net/}{https://launchpad.net/}. \last{May~2}} with 4.2m users,\footnote{\href{https://launchpad.net/people}{https://launchpad.net/people}. \last{May~2}} and Gitlab,\footnote{\href{https://about.gitlab.com/}{https://about.gitlab.com/}. \last{May~2}} which holds the majority share of self-hosted Git platforms.\footnote{\href{https://about.gitlab.com/is-it-any-good/}{https://about.gitlab.com/is-it-any-good/}. \last{May~2}} All of these platforms are based around Git, the versioning software developed by Linus Torvalds, used to store different versions of code for developers and teams, which lends itself particularly to shared code that can be updated easily by outside and community developers. (`Repository' is a term for a single Git instance, equatable with a single project.)

Self-hosted Git instances are a common way of storing proprietary code; one sets up a versioning system within a company, using the tools and set of social standards that developers are used to from working on open source code, but limit access to employees. This is what is meant by GitLab's statement that they host most self-hosted Git platforms. Git is not the only possible versioning software for this; Google has their own versioning tool, Piper, which hosts the over two billion lines of code used by the majority of the company in a single repository.\footnote{\href{https://www.wired.com/2015/09/google-2-billion-lines-codeand-one-place/}{https://www.wired.com/2015/09/google-2-billion-lines-codeand-one-place/}. \last{May~2}} Self-hosted Git instances are generally not open source. Generally, if someone wants to use a shared Git repository, they are limited to paying a fee for a hosting service, or using sites that have a freemium model where public repositories are free, but private or enterprise instances are not.

There are alternatives to cloud storage (the `cloud' here being a common metaphor for hosting on someone else's servers) with a hosting provider; one would be storing the code on your own website, and running your own server or building the user interface yourself. This is largely uncommon due to setup costs, but occasionally happens with academics and smaller teams who are not used to larger hosts or who are worried about the longevity of providers. This latter worry is founded; for instance, Google Code\footnote{\href{https://code.google.com/archive/}{https://code.google.com/archive/}. \last{May~3}} was closed after ten years of running in 2016, causing many projects to need to port to another service such as GitHub. For academics, a common solution to offset setup and hosting costs is to use university websites and archives as a suitable place to store open source code. For instance, Giellatekno, a language-technology research group, and Divvun, a linked product development group, both work primarily on S\'ami languages, and both use the same Subversion (another versioning system) database for storing their code \citep{moshagenopen}, which is hosted by UiT The Arctic University of Norway.\footnote{\href{http://giellatekno.uit.no/}{http://giellatekno.uit.no/}. \last{May~2}}

In a large part, the question of where to store information - especially academic information regarding languages - is one which the large archival sites mentioned in Section~\ref{subsec:finding-resources} were created to solve. In particular, this is true for non-code resources, such as audio and video corpora, which historically have been prioritised for storage over code due to the size and relative importance of the corpora, and due to the older industry standards of keeping all code related to research private, especially when that code was funded by enterprise. Many of these sites are repositories of metadata which point to individually hosted content, which makes the links susceptible to link rot, but offloads the issue of storage altogether.

Today, however, there is a sea change towards putting computational work in the open. Occasionally, this means that academics point to the open source code for their papers on GitHub or elsewhere, or publish their software itself as a research object. For example, \citet{makela2016integrated} and \citet{kleinberg2017web} were published with the Journal of Open Source Software (JOSS)\footnote{\href{http://joss.theoj.org/}{http://joss.theoj.org/}. \last{May~2}} \citep{smith2018journal}, which peer-reviews, publishes, and assigns digital object identifiers (DOIs) to software as a way of recognising important academic work. The code for these papers is publicly available on GitHub. Incentivising academics to publish their code openly is difficult, as software is not weighted in job reviews the same way as research papers; however, there are other benefits such as reproducibility and transparency. There are efforts to align these incentives; for instance, The Austin Principles of Data Citation in Linguistics \citep{AustinPrinciples2017} was created to emphasise the importance of citing, using, and storing linguistic data properly. Standardising open source paradigms in academia is an ongoing work.

\subsection{Digital permanence and storage}
\label{subsec:digital-permanence}

Focusing a bit closer on the academic use case, we can easily imagine a case where a professor puts code related to research on a university server, only to see that server change hands, go offline, or become defunct if the professor leaves the university for a position elsewhere or if their focus changes. This is more true of graduate students, who do not have the same locational longevity as staff. As mentioned briefly above, this can lead to link rot; links which formerly pointed to workable software may then point nowhere or to the wrong resource. Links can also be improperly shared; for instance, some websites may have improper redirect settings, which means that typing {\tt http://example.com} may go to a different location or fail to resolve, if {\tt https://example.com} was the correct address.\footnote{For example, {\tt resourcebook.eu/searchll.php} does not resolve, but {\tt http://www.resourcebook.eu/searchll.php} does. This lead me to mistakenly believe that \citepos{calzolari2012lre} website was down for several weeks.}

These are artefacts of systemic defects; in a location-based protocol (such as the Hypertext Transfer Protocol (http) protocol used by most websites today), consistency of location is prioritised over consistency of content. If the content was pointed to using some more permanent reference, such as a DOI, than the object could be moved without issues, and the problem of link rot is largely solved.

Digital permanence is a larger issue than code placed in locations by individual actors, however. Large organisations may lose their funding, come to the end of their expected lifecycle, or decide to shutter or obfuscate projects upon which research or language communities may depend. A good example would be Google Code, mentioned above in Section~\ref{subsec:where-is-open-source-code}. Another example might be listserv.heanet.ie,\footnote{\href{https://listserv.heanet.ie}{https://listserv.heanet.ie}. \last{May~2}} which probably held the largest corpus of Irish data at one point, but which was unavailable to crawlers and depends upon the hosting of heanet.ie for continued service \citep{scannell2007crubadan}. A final example might be the linguistic vitality database by Kornai's group mentioned earlier,\footnote{\href{https://hlt.bme.hu/en/projects/lingvit}{https://hlt.bme.hu/en/projects/lingvit}. \last{May~2}} which is is currently in stasis pending funding (Kornai personal communications, 2018).

Aside from the problem of code actively being stored, there is another issue with code rot. Over time, the ecosystem around which code is built changes, and it becomes harder to reproduce the original environment where code was installed and executed, leading to the code itself becoming less useful \citep{eide2010toward}. Some solutions to this problem involve using containers like Docker to emulate the original environment \citep{boettiger2015introduction}. While this research has largely been driven by a need to replicate scientific results \citep{schwab2000making, barnes2010publish, ince2012case}, it is also relevant outside of academic research to enterprise and community solutions to difficult coding problems, such as natural language processing.

As computational languages naturally evolve, it is important to take into account that the code must also be maintained if it is going to find consistent usage. Maintenance is a difficult task that has few immediate incentives, and which generally involves long timelines. It involves not only solving bugs as they appear with general usage, but also ensuring that the code stays relevant in a changing ecosystem. No package exists or application exists by itself; each depends upon other code to run. This is especially true for software built using the Unix methodology of piecing together many small pieces of software that do one function well.\footnote{\href{http://www.catb.org/esr/writings/taoup/html/ch01s06.html}{http://www.catb.org/esr/writings/taoup/html/ch01s06.html}. \last{May~3}} Applications also depend upon code, though; as operating systems (OS) update, legacy maintenance is needed to ensure forward compatibility, or the code will become defunct as no one will be able to run it on current OSs. However, providing funding for maintenance at the OS timescale is exceedingly difficult.

% Note: I moved Data and Privacy to section 4, because I want to talk about LRLs specifically there, and this is more of an issue for LRLs. I deleted the old Licensing and Liability section, as I already mentioned it.

% Covered briefly below
% \subsection{Military and enterprise solutions}
% \label{subsec:military-and-enterprise}
% In this section, I will talk about how open source meshes with military and enterprise development.

\subsection{Funding}
\label{subsec:oss-funding}

Open source code cannot by definition be sold directly for a profit; open source code must be freely available to all users. This raises an issue where funding for open source development is not direct in the sense of immediate fiscal returns. In this business environment, other funding models need to be pursued. The obvious, most common solution is to sell services on top of open source code, and give away the code itself for free. There are benefits to doing this. Giving away code can be seen as a marketing tactic, drawing other developers, or it may serve to develop a community of active developers who are interested in giving back to the original project without being employed by the core developer's company, or it may serve as a retention device keeping in-house developers who prefer to work in the open happy, or it may serve as a way of verifying a level of security for the code itself, by allowing other participants to point out flaws in the system and fix them without needing to rely upon expensive and possibly ineffectual internal security audits.

For researchers, open sourcing code can be seen as a major time investment \citep{fitzjohn2014reproducible, lowndes2017our}, and although it can help reproducibility, it is not normally the primary source of sharing research (which would be the scientific article). For researchers, funding needs to come from either salaries, from the researcher's free time, or from grants from larger institutions (not counting enterprise and interdisciplinary cross-overs). This is a serious barrier to open source work in the sciences.

For militaries and governments, there is little incentive to open source unless there is a direct mandate from their political constituents or legal process. Even when there are open challenges run by military branches - for example, the DARPA-sponsored LORELEI challenge\footnote{\href{https://www.nist.gov/itl/iad/mig/lorehlt-evaluations}{https://www.nist.gov/itl/iad/mig/lorehlt-evaluations}. \last{May~2}} - there are often no demands that any resulting work be open sourced (although the initial challenge is open sourced as a way of inviting participation). Often, this is because the code itself has security concerns; for example, open sourcing speech recognition software for languages spoken by military targets in lossy situations (such as over cell networks) would only illuminate that such software exists. This example of security through closed source methodologies extends to enterprise; for Google to open all of their MT data would cause them to lose a competitive edge in the translation market.

For software developers outside of academia, militaries, governments, and large enterprises that have business advantages, however, open sourcing code can be a significant way to gain prestige, to improve and market developer relations, to market themselves to prospective clients and companies, and to contribute to their coding communities. There are a variety of ways of funding work within the open source model.

One direct way is to add payment schemes directly to source code or to a website, asking for donations. Another would be to use a collective community to allocate donations and funds; Open Collective\footnote{\href{https://opencollective.com/}{https://opencollective.com/}. \last{May~2}} is an example of a company that helps do this for developers, some of whom are paid entirely through funds on the site.\footnote{\href{https://medium.com/open-collective/a-new-way-to-fund-open-source-projects-91a51b1b7aac}{https://medium.com/open-collective/a-new-way-to-fund-open-source-projects-91a51b1b7aac}. \last{May~2}} Crowdfunding sites can also be useful for some developers. Patreon is a good example where makers can earn money directly through fan donations, while Kickstarter has been used many times to fund projects. For example, Dave Gandy, the developer for Font Awesome, an open source font resource, raised over a million dollars in a month from 35,550 backers for the next version of his product.\footnote{\href{https://www.kickstarter.com/projects/232193852/font-awesome-5}{https://www.kickstarter.com/projects/232193852/font-awesome-5}. \last{May~2}} Code bounties, funds set by community members hoping to have other developers solve bugs, is another limited way of making money.\footnote{\href{https://www.bountysource.com/}{https://www.bountysource.com/}. \last{May~2}} Cryptocurrencies may eventually present other ways of funding open source, either directly,\footnote{\href{https://utopian.io/}{https://utopian.io/}. \last{May~2}}\footnote{\href{https://gitcoin.co/}{https://gitcoin.co/}. \last{May~2}} or through other avenues like initial coin offerings. Already, some companies are using initial coin offerings (similar to IPOs in the business world, but instead marking the launch of a new cryptocurrency) to fund development on open source, such as with Filecoin, which raised over 200 million for their coin development, of which many of the funds will go directly to open source projects run by the company Protocol Labs, such as IPFS \citep{benet2014ipfs} on GitHub.\footnote{\href{https://coinlist.co/filecoin}{https://coinlist.co/filecoin}. \last{May~2}}

There are several guides online that outline other ways of funding open source.\footnote{\href{https://github.com/nayafia/lemonade-stand}{https://github.com/nayafia/lemonade-stand}. \last{May~2}}\footnote{\href{https://medium.com/open-source-life/money-and-open-source-d44a1953749c}{https://medium.com/open-source-life/money-and-open-source-d44a1953749c}. \last{May~2}}\footnote{\href{https://opensource.guide/getting-paid/}{https://opensource.guide/getting-paid/}. \last{May~2}} In the end, the majority of open source developers are not remunerated for their work directly. Most open source work is unpaid, and maintenance of open source software can be demanding and costly for developers who do not set expectations around levels of support for users. This is especially difficult for developers who do not have total control of their projects, such as is often the case with developers doing open source within a company.

More specifically, the problem of funds being directed to low resource languages is unlikely to be solved by any of the proposed solutions above. However, by banding together and sharing tools openly \citep{streiter2006implementing}, computational linguists working on low resource languages can expedite their work. This methodology will be explored in Chapter~\ref{sec:lrl-code}.

% Note: I moved the entire section on ethics in open source to the Discussions chapter

\subsection{Summary}

Above, I have defined what `open source' means and explored some of the intricacies behind licensing software appropriately. Open source is a complex term which needs to be unpacked; it is important to take into account free distribution, license placement, derivations, and some of the other restrictions placed on software by the OSI to ensure a FLOSS label for one's code. Choosing a license, and understanding the licenses of available work, is a mandatory part of using or depending on open source work. After a brief outline of these considerations, I explored where most open source in the world can be found, and discussed issues involving code rot, data permanence, and storage of code by academics. Keeping this context in mind will be relevant for the next chapter, where I go into depth about how open source can be used to help low resource languages. As well, as mentioned in Chapter~\ref{sec:resources}, funding is another consideration to remember, as open source does not mean free, and someone eventually needs to foot the bill for all computational work.

At this point, the context has been laid for the main brunt of the thesis, which is Chapter~\ref{sec:lrl-code} on FLOSS code for LRLs, and Chapter~\ref{sec:case-studies} on case studies.