Skip to content

Type mapping inconsistency & question #304

@mjgarton

Description

@mjgarton

I noticed recent improvements to some of the type mapping in #277

One example of what changed here is that uint8 now maps to INT2 which makes sense to avoid the overflow issue mentioned in the PR.

However, other code hasn't been updated to match. For example, in datafusion-pg-catalog/src/pg_catalog/pg_attribute.rs we have a mapping DataType::UInt8 => (18, 2, true, "s", "p"), // char which means the catalog says something different about the postgres type of the field.

My first thought was to simply update this to match, but it seems fragile if we have to always remember to update in two places.

Instead, I am exploring the idea of allowing datafusion-pg-catalog to depend on arrow-pg. Then we can use arrow_pg::datatypes::into_pg_type to do most of the work for us in a way that will still work if the arrow-pg mappings change again.

I have a question though about the existing mapping code in PgAttributeTable::datafusion_to_pg_type.

In the following code, I can understand some of the returned values, but some are not clear to me:

        match data_type {
            DataType::Boolean => (16, 1, true, "c", "p"),  // bool
            DataType::Int8 => (18, 1, true, "c", "p"),     // char
            DataType::Int16 => (21, 2, true, "s", "p"),    // int2
            DataType::Int32 => (23, 4, true, "i", "p"),    // int4
            DataType::Int64 => (20, 8, true, "d", "p"),    // int8
            DataType::UInt8 => (18, 2, true, "s", "p"),    // char
            DataType::UInt16 => (21, 4, true, "i", "p"),   // int2
            DataType::UInt32 => (23, 8, true, "d", "p"),   // int4
            DataType::UInt64 => (20, -1, false, "i", "m"), // int8
            DataType::Float32 => (700, 4, true, "i", "p"), // float4
            DataType::Float64 => (701, 8, true, "d", "p"), // float8
            DataType::Utf8 => (25, -1, false, "i", "x"),   // text
            DataType::LargeUtf8 => (25, -1, false, "i", "x"), // text
            DataType::Binary => (17, -1, false, "i", "x"), // bytea
            DataType::LargeBinary => (17, -1, false, "i", "x"), // bytea
            DataType::Date32 => (1082, 4, true, "i", "p"), // date
            DataType::Date64 => (1082, 4, true, "i", "p"), // date
            DataType::Time32(_) => (1083, 8, true, "d", "p"), // time
            DataType::Time64(_) => (1083, 8, true, "d", "p"), // time
            DataType::Timestamp(_, _) => (1114, 8, true, "d", "p"), // timestamp
            DataType::Decimal128(_, _) => (1700, -1, false, "i", "m"), // numeric
            DataType::Decimal256(_, _) => (1700, -1, false, "i", "m"), // numeric
            _ => (25, -1, false, "i", "x"),                // Default to text for unknown types
        }

For example, the DataType::UInt64 mapping becomes oid 20 (INT8) but the typelen is -1. Why is it -1?

What I would like to do is replace this block with something like:

        match arrow_pg::datatypes::into_pg_type(data_type) {
            Ok(t @ postgres_types::Type::BOOL) => (t.oid(), 1, true, "c", "p"),
            Ok(t @ postgres_types::Type::CHAR) => (t.oid(), 1, true, "c", "p"),
            // TODO: add all of the other mappings here.
            _ => (25, -1, false, "i", "x"), // Default to text for unknown types
        }

but I am reluctant to change anything until I understand the existing code better.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions