Type mapping inconsistency & question

I noticed recent improvements to some of the type mapping in https://github.com/datafusion-contrib/datafusion-postgres/pull/277

One example of what changed here is that `uint8` now maps to `INT2` which makes sense to avoid the overflow issue mentioned in the PR.

However, other code hasn't been updated to match.  For example, in `datafusion-pg-catalog/src/pg_catalog/pg_attribute.rs` we have a mapping `DataType::UInt8 => (18, 2, true, "s", "p"),    // char` which means the catalog says something different about the postgres type of the field.

My first thought was to simply update this to match, but it seems fragile if we have to always remember to update in two places.

Instead, I am exploring the idea of allowing `datafusion-pg-catalog` to depend on `arrow-pg`.  Then we can use `arrow_pg::datatypes::into_pg_type` to do most of the work for us in a way that will still work if the `arrow-pg` mappings change again.

I have a question though about the existing mapping code in  `PgAttributeTable::datafusion_to_pg_type`.

In the following code, I can understand some of the returned values, but some are not clear to me:

```
        match data_type {
            DataType::Boolean => (16, 1, true, "c", "p"),  // bool
            DataType::Int8 => (18, 1, true, "c", "p"),     // char
            DataType::Int16 => (21, 2, true, "s", "p"),    // int2
            DataType::Int32 => (23, 4, true, "i", "p"),    // int4
            DataType::Int64 => (20, 8, true, "d", "p"),    // int8
            DataType::UInt8 => (18, 2, true, "s", "p"),    // char
            DataType::UInt16 => (21, 4, true, "i", "p"),   // int2
            DataType::UInt32 => (23, 8, true, "d", "p"),   // int4
            DataType::UInt64 => (20, -1, false, "i", "m"), // int8
            DataType::Float32 => (700, 4, true, "i", "p"), // float4
            DataType::Float64 => (701, 8, true, "d", "p"), // float8
            DataType::Utf8 => (25, -1, false, "i", "x"),   // text
            DataType::LargeUtf8 => (25, -1, false, "i", "x"), // text
            DataType::Binary => (17, -1, false, "i", "x"), // bytea
            DataType::LargeBinary => (17, -1, false, "i", "x"), // bytea
            DataType::Date32 => (1082, 4, true, "i", "p"), // date
            DataType::Date64 => (1082, 4, true, "i", "p"), // date
            DataType::Time32(_) => (1083, 8, true, "d", "p"), // time
            DataType::Time64(_) => (1083, 8, true, "d", "p"), // time
            DataType::Timestamp(_, _) => (1114, 8, true, "d", "p"), // timestamp
            DataType::Decimal128(_, _) => (1700, -1, false, "i", "m"), // numeric
            DataType::Decimal256(_, _) => (1700, -1, false, "i", "m"), // numeric
            _ => (25, -1, false, "i", "x"),                // Default to text for unknown types
        }
```

For example, the `DataType::UInt64` mapping  becomes oid 20 (INT8) but  the typelen is -1.   Why is it -1?

What I would like to do is replace this block with  something like:

```
        match arrow_pg::datatypes::into_pg_type(data_type) {
            Ok(t @ postgres_types::Type::BOOL) => (t.oid(), 1, true, "c", "p"),
            Ok(t @ postgres_types::Type::CHAR) => (t.oid(), 1, true, "c", "p"),
            // TODO: add all of the other mappings here.
            _ => (25, -1, false, "i", "x"), // Default to text for unknown types
        }
```

but I am reluctant to change anything until I understand the existing code better.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Type mapping inconsistency & question #304

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Type mapping inconsistency & question #304

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions