Add files via upload

danielhatton · web-flow · commit bb410ec3361c · 2022-07-04T17:33:42.000+01:00
JOSS paper somewhat redrafted.
diff --git a/paper/paper.md b/paper/paper.md
@@ -23,53 +23,52 @@ bibliography: paper.bib
 ## Scope
 
 This software is relevant to a situation in which:
-* the value of some dependent variable depends on the values of some
-  collection of zero or more independent variables;
+
+* the value of some dependent variable depends on the values of zero
+  or more independent variables;
 * there exists a data set of empirical measurements of the value of
   the dependent variable, each accompanied by empirical measurements
-  of the corresponding values of all the independent variables;
+  of corresponding values of all the independent variables;
 * there exists at least one theoretical model for predicting the value
   of the dependent variable from the values of the independent
   variables;
 * each model contains zero or more adjustable parameters,
-  i.e. quantities asserted in the model to be constant, and which
-  affect the mapping from independent variable values to dependent
-  variable value, but whose exact values are not known a priori; and
+  i.e. constant quantities which affect the mapping from independent
+  variable values to dependent variable value, whose exact values are
+  not known a priori; and
 * one wishes to infer from the data the values of the parameters
   in each model, and, if there is more than one model, which model is
   most probably true.
 
 ## Action
 
-In Bayesian parameter estimation, beliefs about the values of
-parameters, in light of data, can [@Jeffreys:1931:SI;
-@Jeffreys:1932:TEL] be summarized in a posterior expectation value and
-standard error.  If the model has more than one parameter, the
-posterior standard error comes [@Jeffreys:1931:SI; @Jeffreys:1932:TEL]
-in two versions: a conditional standard error, based on the assumption
-that the other parameters take their posterior expectation values; and
-a marginal standard error, which is a probability-weighted average
-over all values of the other parameters.  In Bayesian model
-comparison, the goodness of fit of a model to data is
-[@Jeffreys:1935:STS] measured by a quantity known as the marginal
-likelihood, which automatically embodies Occam's razor and is
-therefore suitable for direct comparison with other models.
-
-However, in software procurement terms, it is much easier to obtain
-software packages and libraries which, rather than Bayesian parameter
-estimation and model comparison, instead perform least squares fitting
-[qv. @Legendre:1805:NMD], which defines a badness of fit measure
-called chi-squared for a model _with specific values of its
-parameters_.  Software typically outputs the minimum value of
-chi-squared with respect to the parameters, estimators of the
-parameters computed as the point in parameter space that achieves that
-minimum chi-squared, and some or all of the elements of the Hessian of
-chi-squared with respect to the parameters at that point in parameter
-space, intended for use in estimating the standard errors of the
-parameters by heuristic methods.  Software documentation also often
-suggests heuristic ways of attempting model comparison with these
-outputs.  This is convenient, but lacks clear epistemological
-underpinning.
+In Bayesian parameter estimation, beliefs about parameter values, in
+light of data, can [@Jeffreys:1931:SI; @Jeffreys:1932:TEL] be
+summarized in a posterior expectation value and standard error.  If
+the model has more than one parameter, the posterior standard error
+has [@Jeffreys:1931:SI; @Jeffreys:1932:TEL] two variants: a
+conditional standard error, based on the assumption that the other
+parameters take their posterior expectation values; and a marginal
+standard error, which involves a probability-weighted average over all
+values of the other parameters.  In Bayesian model comparison, the
+goodness of fit of a model to data is [@Jeffreys:1935:STS] measured by
+a quantity known as the marginal likelihood, which automatically
+embodies [@MacKay:1992:BI] Occam's razor and is therefore suitable for
+direct comparison with other models.
+
+However, it is much easier to obtain software packages and libraries
+which, rather than Bayesian parameter estimation and model comparison,
+instead perform least squares fitting [qv. @Legendre:1805:NMD], which
+defines a badness of fit measure called chi-squared for a model _with
+specific values of its parameters_.  Software typically outputs the
+minimum value of chi-squared with respect to the parameters; the
+parameter values that achieve that minimum chi-squared; and some or
+all of the elements of the Hessian of chi-squared with respect to the
+parameters at that point in parameter space, intended for use in
+estimating the standard errors of the parameters by heuristic methods.
+Software documentation also often suggests heuristic ways of
+attempting model comparison with these outputs.  This is convenient,
+but lacks clear epistemological underpinning.
 
 By a happy coincidence, however, these typical outputs of
 least-squares fitting software contain enough information to obtain,
@@ -79,11 +78,12 @@ versions) of the parameters, and to the marginal likelihood.  This
 calculation is not entirely straightforward, and that is where the
 present package, `leastsqtobayes`, comes in.  It is written in a
 combination of three languages: Gnuplot [@Williams:2015:GRM], which
-has powerful built-in capabilities for least-squares fitting; Octave
+has powerful (indeed uniquely suitable in its handling of measurement
+uncertainties) built-in capabilities for least-squares fitting; Octave
 [@Eaton:2012:GOR], which has the matrix and scalar arithmetic
-capabilities for the conversion; and Perl [@Wall:2022:P], with the
-text-processing capabilities to create bespoke sections Octave code
-containing results produced by Gnuplot and vice versa.
+capabilities for the calculations; and Perl [@Wall:2022:P], with the
+text-processing capabilities to facilitate intercommunication between
+Gnuplot and Octave code.
 
 `leastsqtobayes` takes as input an empirical data set; a formula
 representing a model; and specifications of the prior
@@ -96,21 +96,22 @@ parameters; and the marginal likelihood.
 ## Parameter estimation step
 
 Several recent workers [e.g. @Fenton:2022:UVA; @Albert:2022:BAE;
-@Gerster:2022:EBC] have, in attempting to infer the posterior
-expectations and standard errors of parameters in a model from data,
-particularly in cases where the distinction between conditional
-standard errors and marginal standard errors matters, found themselves
-unable to obtain those values with "off the peg" least-squares fitting
-processes, and have had to resort to bespoke computational approaches.
-Indeed, the present author has [@Hatton:2003:SPE; @Sammonds:2017:MSI]
-found himself in the same position.
+@Gerster:2022:EBC], in common with the present author
+[@Hatton:2003:SPE; @Sammonds:2017:MSI], have, in attempting to infer
+posterior expectations and standard errors of parameters in a model
+from data, found themselves unable to obtain those values with "off
+the peg" least-squares fitting processes, and have had to resort to
+bespoke computational approaches, with all the risks and duplication
+of effort in software quality control this implies.  This has been
+particularly prevalent in cases where the distinction between
+conditional and marginal standard errors matters.
 
 ## Model comparison step
 
 @Dunstan:2022:ECB argue that all users of least-squares fitting should
 supply the value of the marginal likelihood for each model they fit,
 and note the current ubiety of applications of least-squares fitting
-in which such a value is not supplied.  They further point out that a
+in which no such value is supplied.  They further point out that a
 reason for general non-reporting of marginal likelihood values is the
 perceived computational complexity of obtaining those values.
 
@@ -123,20 +124,19 @@ formulae having been in the open literature for decades,
 @Dunstan:2022:ECB attribute the perceived complexity of computing the
 marginal likelihood, which they believe leads to the absence of
 marginal likelihood computations in most applications of least-squares
-fitting, to a failure to use these formulae.
-
-However, @Dunstan:2022:ECB leave as an exercise for the reader the
-implementation issues of how to extract, from the outputs of standard
-least-squares fitting software, the information required as input to
-the @Lindley:1980:ABM and @Kass:1995:BF formulae, and how to perform
-the actual computation.  The primary challenge in that computation is
-that it involves the determinant of the Hessian of the chi-squared
-statistic with respect to the parameters, at the location in parameter
-space that minimizes chi-squared.  That is where the present software,
-`leastsqtobayes`, comes in: it resolves these issues by having the
-Gnuplot least-squares fitting system output its results in a format
-suitable for direct import to the Octave scientific programming
-language, with its inbuilt determinant-finding capability, then have
-Octave compute the determinant, by way of using Perl's text-processing
-capabilities to generate bespoke Gnuplot and Octave code for any given
-inference problem.
+fitting, to a widespread failure to use these formulae.
+
+# From the need to the software
+
+However, @Dunstan:2022:ECB leave as an exercise for the reader finding
+how to extract, from the outputs of standard least-squares fitting
+software, the information required as input to the @Lindley:1980:ABM
+and @Kass:1995:BF formulae, and how to perform the actual computation.
+Both are somewhat challenging, the former because Gnuplot outputs its
+results in a format that is not very readily interoperable with other
+systems, and the latter because of the need to compute the determinant
+of the Hessian.  That is where the present software, `leastsqtobayes`,
+comes in: it resolves the former issue using the text-processing
+capabilities of Perl, and the latter using the matrix algebra
+capabilities of Octave.  The final output to the user includes all the
+quantities for which an inferential need is identified above.