-
Notifications
You must be signed in to change notification settings - Fork 29
Description
I noticed recent improvements to some of the type mapping in #277
One example of what changed here is that uint8 now maps to INT2 which makes sense to avoid the overflow issue mentioned in the PR.
However, other code hasn't been updated to match. For example, in datafusion-pg-catalog/src/pg_catalog/pg_attribute.rs we have a mapping DataType::UInt8 => (18, 2, true, "s", "p"), // char which means the catalog says something different about the postgres type of the field.
My first thought was to simply update this to match, but it seems fragile if we have to always remember to update in two places.
Instead, I am exploring the idea of allowing datafusion-pg-catalog to depend on arrow-pg. Then we can use arrow_pg::datatypes::into_pg_type to do most of the work for us in a way that will still work if the arrow-pg mappings change again.
I have a question though about the existing mapping code in PgAttributeTable::datafusion_to_pg_type.
In the following code, I can understand some of the returned values, but some are not clear to me:
match data_type {
DataType::Boolean => (16, 1, true, "c", "p"), // bool
DataType::Int8 => (18, 1, true, "c", "p"), // char
DataType::Int16 => (21, 2, true, "s", "p"), // int2
DataType::Int32 => (23, 4, true, "i", "p"), // int4
DataType::Int64 => (20, 8, true, "d", "p"), // int8
DataType::UInt8 => (18, 2, true, "s", "p"), // char
DataType::UInt16 => (21, 4, true, "i", "p"), // int2
DataType::UInt32 => (23, 8, true, "d", "p"), // int4
DataType::UInt64 => (20, -1, false, "i", "m"), // int8
DataType::Float32 => (700, 4, true, "i", "p"), // float4
DataType::Float64 => (701, 8, true, "d", "p"), // float8
DataType::Utf8 => (25, -1, false, "i", "x"), // text
DataType::LargeUtf8 => (25, -1, false, "i", "x"), // text
DataType::Binary => (17, -1, false, "i", "x"), // bytea
DataType::LargeBinary => (17, -1, false, "i", "x"), // bytea
DataType::Date32 => (1082, 4, true, "i", "p"), // date
DataType::Date64 => (1082, 4, true, "i", "p"), // date
DataType::Time32(_) => (1083, 8, true, "d", "p"), // time
DataType::Time64(_) => (1083, 8, true, "d", "p"), // time
DataType::Timestamp(_, _) => (1114, 8, true, "d", "p"), // timestamp
DataType::Decimal128(_, _) => (1700, -1, false, "i", "m"), // numeric
DataType::Decimal256(_, _) => (1700, -1, false, "i", "m"), // numeric
_ => (25, -1, false, "i", "x"), // Default to text for unknown types
}
For example, the DataType::UInt64 mapping becomes oid 20 (INT8) but the typelen is -1. Why is it -1?
What I would like to do is replace this block with something like:
match arrow_pg::datatypes::into_pg_type(data_type) {
Ok(t @ postgres_types::Type::BOOL) => (t.oid(), 1, true, "c", "p"),
Ok(t @ postgres_types::Type::CHAR) => (t.oid(), 1, true, "c", "p"),
// TODO: add all of the other mappings here.
_ => (25, -1, false, "i", "x"), // Default to text for unknown types
}
but I am reluctant to change anything until I understand the existing code better.