@@ -126,11 +126,11 @@ The atomic vector's group may also contain `**/names`, a 1-dimensional string da
126126This should use a datatype that can be represented by a UTF-8 encoded string.
127127If ` **/data ` is a scalar, ` **/names ` should have length 1.
128128
129- ### Representing missing values
129+ #### Representing missing values
130130
131131``` {r, echo=FALSE, results="asis"}
132132if (.version >= package_version("1.1")) {
133- cat('Each `**/data` dataset may optionally contain a `missing-value-placeholder` attribute.
133+ cat('The `**/data` dataset may optionally contain a `missing-value-placeholder` attribute.
134134If present, this should be a scalar dataset that specifies the placeholder for missing values.
135135Any value of `**/data` that is equal to this placeholder should be treated as missing.
136136If no such attribute is present, it can be assumed that there are no missing values.')
@@ -171,7 +171,7 @@ If no such attribute is present, it can be assumed that there are no missing val
171171
172172``` {r, echo=FALSE, results="asis"}
173173if (.version >= package_version("1.3")) {
174- cat("Check out the [HDF5 policy draft (v0.1.0)](https://github.com/ArtifactDB/Bioc-HDF5-policy/tree/v0.1.0). for more details.")
174+ cat("Check out the [HDF5 policy draft (v0.1.0)](https://github.com/ArtifactDB/Bioc-HDF5-policy/tree/v0.1.0) for more details.")
175175}
176176```
177177
@@ -191,20 +191,31 @@ if (.version == package_version("1.0")) {
191191 This should use a datatype that can be represented by a UTF-8 encoded string.
192192
193193The group should contain an 1-dimensional dataset at ` **/data ` , containing 0-based indices into the levels.
194- This should use a HDF5 integer datatype that can be represented by a 32-bit signed integer.
195- (Admittedly, this should have been an unsigned integer, but we started with a signed integer and we'll just keep it so for back-compatibility.)
196- Missing values are represented as described above for atomic vectors.
194+ Vectors of length 1 may also be represented as a scalar dataset.
195+ (While R makes no distinction between scalars and length-1 vectors, this may be useful for other frameworks where this difference is relevant.)
196+
197+ The ` **/data ` dataset should use a HDF5 integer datatype that can be represented by a 32-bit signed integer.
198+ Admittedly, this should have been an unsigned integer, but we started with a signed integer and we'll just keep it so for back-compatibility.
197199
198200The group should contain ` **/levels ` , a 1-dimensional string dataset that contains the levels for the indices in ` **/data ` .
199201This should use a datatype that can be represented by a UTF-8 encoded string.
200202Values in ` **/levels ` should be unique.
201203
202- Values in ` **/data ` should be non-negative (missing values excepted) and less than the length of ` **/levels ` .
204+ Values in ` **/data ` should be non-negative (missing value placeholders excepted) and less than the length of ` **/levels ` .
203205Note that the datatype constraints on ` **/data ` suggest that there should not be more than 2147483647 levels,
204206as beyond that, the levels cannot be indexed by elements of ` **/data ` .
205207
208+ Missing values in the factor are represented by a placeholder, as described [ above] ( representing-missing-values ) for atomic integer vectors.
209+ ``` {r, echo=FALSE, results="asis"}
210+ if (.version >= package_version("1.1")) {
211+ cat('Specifically, the `**/data` dataset may contain an optional `missing-value-placeholder` attribute,
212+ which contains the placeholder used to represent missing values inside `**/data`.')
213+ }
214+ ```
215+
206216The group may also contain ` **/names ` , a 1-dimensional string dataset of length equal to ` data ` .
207217This should use a datatype that can be represented by a UTF-8 encoded string.
218+ If ` **/data ` is a scalar, ` **/names ` should have length 1.
208219
209220``` {r, echo=FALSE, results="asis"}
210221if (.version == package_version("1.1")) {
@@ -226,9 +237,10 @@ This is represented as a HDF5 group (`**/`) with the following attributes:
226237
227238This group should contain the `pointers` and `heap` datasets.
228239
229- - The `**/data` dataset should be a 1-dimensional or scalar dataset of a compound datatype of 2 members, `"offset"` and `"length"`.
240+ - The `**/data` dataset should be a 1-dimensional dataset of a compound datatype of 2 members, `"offset"` and `"length"`.
230241 Each member should be of a datatype that can be represented by an unsigned 64-bit integer.
231- If the dataset is scalar, the length of the VLS array is defined as 1.
242+ Arrays of length 1 may also be represented as a scalar dataset.
243+ (While R makes no distinction between scalars and length-1 vectors, this may be useful for other frameworks where this difference is relevant.)
232244- The `**/heap` dataset should be a 1-dimensional dataset of unsigned 8-bit integers.
233245
234246Each entry of `**/data` refers to a slice `[offset, offset + length)` of the `**/heap` dataset.
0 commit comments