Skip to content

Address Pitfalls of Numerical Datatypes in RDF #82

@jmkeil

Description

@jmkeil

There are a couple of issues with numerical datatypes that make the accurate use of RDF for numerical data error-prone.

The use of xsd:float and xsd:double entails a risk

  • of value distortion in the mapping between lexical space and value space (e.g. "0.1"^^xsd:float is typically mapped to the value 0.1000000014901161), and
  • of numerical issues in the processing (e.g. calculations in SPARQL queries) of the represented values, i.e. underflow errors, overflow errors, rounding errors, cancellation, and error accumulation .

In most cases, xsd:decimal would be a better choice:

  • In particular, I disagree with XML Schema Datatypes in RDF and OWL, W3C Working Group Note 14 March 2006 on the point that xsd:float and xsd:double are the appropriate datatypes for measurements. In my point of view, this only holds in case of measurements that origin from binary floating point sources (e.g. numeric calculations or outputs of analog-to-digital converters). Other measures typically have a value and the measurement uncertainty of the used measurement device, resulting in the representation by two precise values, which should both be represented with xsd:decimal.
  • Another exception are cases, where a representation of Infinite is required, which is only provided by xsd:float and xsd:double.

The use of xsd:decimal for value representation does not considerably impede the use of floating point arithmetic for calculations (e.g. for performance reasons), as the conversion is trivial. In contrast, if a rounding of the lexical representation must be avoided, the other direction would require non standard-conform and (depending on the framework) probably cumbersome to implement custom lexical mappings, and is not always possible (e.g. inside of SPARQL queries).

However, I don't see awareness for these issues in general and especially in teaching material.

Further, RDF unnecessarily inherits limitations from XSD: Exponential notation is only supported for xsd:float and xsd:double, but not for xsd:decimal (and derived datatypes). It was not included into xsd:decimal as the requirement was already meet with the precisionDecimal datatype, which however, did not become a built-in datatype in RDF. This tempts users to use xsd:double even if not appropriated. The shorthand syntax in Turtle, TriG and SPARQL additionally amplifies this, as xsd:double might be used even if not intended.

(A more detailed discussion of the issues can be found in arXiv:2011.08077 and some reviewer comments on it.)

Possible Actions

I think the following actions would help to ease the accurate representation of numbers in RDF:

  1. Enable exponential notation for xsd:decimal (and derived datatypes) in RDF.
  2. Emphasis in teaching material the implicated risk of numerical issues and the only partial coverage between lexical space and value space of xsd:float/xsd:double resulting in rounded values after the lexical mapping.
  3. Enable tools to hint for the use of xsd:decimal in favor of xsd:float and xsd:double and to warn users if a lexical xsd:float or xsd:double value was entered which would require rounding during the lexical mapping.
  4. Maybe change Turtle, TriG and SPARQL syntax to use exponential notation as shorthand syntax for xsd:decimal instead of xsd:double.

One to three would not cause any backward compatibility problems. Four however, would obviously cause backward compatibility problems ins software, but might at the same time increase the accuracy of value representations in existing RDF documents without change.

Further, one could think about adding mandatory support for precisionDecimal (to have an arbitrary precision datatype with a representation of Infinite), but that is a new feature and goes beyond making RDF easier.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions