Skip to content

Commit c5fb4d9

Browse files
author
Rik
committed
doc: Explain more clearly uses and pitfalls of NA.
* data.txi: Update "Missing Data" section with description of applications of NA, code examples of NA, representation of NA, and possible conversion to NaN. * data.cc (FNA): Add Programming Note to documentation explaining that NA may be converted to NaN on some platforms. Add @Xref to "Missing Data" in manual.
1 parent 1e9ccca commit c5fb4d9

File tree

2 files changed

+50
-11
lines changed

2 files changed

+50
-11
lines changed

doc/interpreter/data.txi

Lines changed: 41 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -102,21 +102,54 @@ IEEE floating point format, values in the range of approximately
102102
The exact values are given by the variables @code{realmin},
103103
@code{realmax}, and @code{eps}, respectively.
104104

105-
Matrix objects can be of any size, and can be dynamically reshaped and
106-
resized. It is easy to extract individual rows, columns, or submatrices
107-
using a variety of powerful indexing features. @xref{Index Expressions}.
105+
Matrix objects can be of any size, and can be dynamically reshaped and resized.
106+
It is easy to extract individual rows, columns, or submatrices using a variety
107+
of powerful indexing features. @xref{Index Expressions}.
108108

109109
@xref{Numeric Data Types}, for more information.
110110

111111
@node Missing Data
112112
@subsection Missing Data
113113
@cindex missing data
114114

115-
It is possible to represent missing data explicitly in Octave using
116-
@code{NA} (short for ``Not Available''). Missing data can only be
117-
represented when data is represented as floating point numbers. In this
118-
case missing data is represented as a special case of the representation
119-
of @code{NaN}.
115+
It is possible to represent missing data explicitly in Octave using NA (short
116+
for ``@w{Not} @w{Available}''). This is helpful in distinguishing between a
117+
property of the data (i.e., some of it was not recorded) and calculations on
118+
the data which generated an error (i.e., created NaN values). In short, if you
119+
do not get the result you expect is it your data or your algorithm?
120+
121+
The missing data marker is a special case of the representation of NaN.
122+
Because of that, it can only be used with data represented by floating point
123+
numbers---no integer, logical, or char values.
124+
125+
In general, use NA and the test @code{isna}, to describe the dataset or to
126+
reduce the dataset to only valid entries. Numerical calculations with NA will
127+
generally "poison" the results and conclude with an output NA. However, this
128+
can not be guaranteed on all platforms and NA may be replaced by NaN.
129+
130+
Example 1 : Describing the dataset
131+
132+
@example
133+
@group
134+
data = [1, NA, 3];
135+
percent_missing = 100 * sum (isna (data(:))) / numel (data);
136+
printf ('%2.0f%% of the dataset is missing\n', percent_missing);
137+
@print{} 33% of the dataset is missing
138+
@end group
139+
@end example
140+
141+
Example 2 : Restrict calculations to valid data
142+
143+
@example
144+
@group
145+
raw_data = [1, NA, 3];
146+
printf ('mean of raw data is %.1f\n', mean (raw_data));
147+
@print{} mean of raw data is NA
148+
valid_data = raw_data (! isna (raw_data));
149+
printf ('mean of valid data is %.1f\n', mean (valid_data));
150+
@print{} mean of valid data is 2.0
151+
@end group
152+
@end example
120153

121154
@DOCSTRING(NA)
122155

libinterp/corefcn/data.cc

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5313,10 +5313,10 @@ DEFUN (NA, args, ,
53135313
@deftypefnx {} {@var{val} =} NA (@dots{}, "like", @var{var})
53145314
@deftypefnx {} {@var{val} =} NA (@dots{}, @var{class})
53155315
Return a scalar, matrix, or N-dimensional array whose elements are all equal
5316-
to the special constant used to designate missing values.
5316+
to the special constant NA (Not Available) used to designate missing values.
53175317
5318-
Note that NA always compares not equal to NA (NA != NA).
5319-
To find NA values, use the @code{isna} function.
5318+
Note that NA always compares not equal to NA (NA != NA). To find NA values,
5319+
use the @code{isna} function.
53205320
53215321
When called with no arguments, return a scalar with the value @samp{NA}.
53225322
@@ -5332,6 +5332,12 @@ will have the same data type, complexity, and sparsity as @var{var}.
53325332
53335333
The optional argument @var{class} specifies the return type and may be
53345334
either @qcode{"double"} or @qcode{"single"}.
5335+
5336+
Programming Note: The missing data marker NA is a special case of the
5337+
representation of NaN. Numerical calculations with NA will generally "poison"
5338+
the results and conclude with an output of NA. However, this can not be
5339+
guaranteed on all platforms and NA may be replaced by NaN.
5340+
@xref{Missing Data}.
53355341
@seealso{isna}
53365342
@end deftypefn */)
53375343
{

0 commit comments

Comments
 (0)