Skip to content

feat: Add TIME_MICRO_UTC type#16468

Open
rui-mo wants to merge 1 commit intofacebookincubator:mainfrom
rui-mo:wip_time
Open

feat: Add TIME_MICRO_UTC type#16468
rui-mo wants to merge 1 commit intofacebookincubator:mainfrom
rui-mo:wip_time

Conversation

@rui-mo
Copy link
Collaborator

@rui-mo rui-mo commented Feb 20, 2026

Both Presto and Spark represent the TIME type using a bigint. The key
differences are:

  1. Spark’s TimeType uses microsecond precision while Presto's TimeType
    uses millisecond precision.
  2. Presto's TimeType represents a time in the session time zone, while
    Spark's TimeType is timezone-unaware.

In Velox, TimeType extends BigintType. To address above differences, this
PR introduces the below four types by adding two template parameters
kPrecision and kLocalTime.

TimeType(ms, local)  --- TIME (Presto compatible)
TimeType(ms, UTC) --- unused
TimeType(μs, local) --- unused
TimeType(μs, UTC) --- TIME_MICRO_UTC (Spark compatible)

Spark implementations need to support casting and functions based on the
TIME_MICRO_UTC type, and ensure that the time value is not subject to time
zone conversion.

Function registration tests with the TimeMicroUtc were added.

Spark's implementation:
https://github.com/apache/spark/blob/master/sql/api/src/main/scala/org/apache/spark/sql/types/TimeType.scala

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 20, 2026
@netlify
Copy link

netlify bot commented Feb 20, 2026

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 0dc2a5c
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/69aabd743862ce0008cf502a

/// timezone aware.
class TimeType final : public BigintType {
TimeType() = default;
constexpr TimeType(TimePrecision precision = TimePrecision::kMilliseconds)
Copy link
Contributor

@Yuhta Yuhta Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we tell the difference during function resolution that a function need TIME(us) instead of TIME(ms)? I am afraid we need more thinking on how this should be implemented. Probably a separate time type instead of parameterizing the current one is a safer way.

@kgpai @mbasmanova What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a separate time type instead of parameterizing the current one is a safer way.

Agree. For milliseconds, inherit from Bigint makes parquet reader more complicated since Velox uses 64-bits, but parquet write 32-bits.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PingLiuPing Extending bigint would be both compatible with Spark and Presto TimeType, as Presto Java also extends LongType, the behavior should remain consistent across the two.

https://github.com/prestodb/presto/blob/master/presto-common/src/main/java/com/facebook/presto/common/type/TimeType.java#L25-L26

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yuhta Since Presto operates with millisecond precision and Spark with microsecond precision, the two precisions are not expected to coexist within the Presto or Spark function registrations. As a result, each engine’s functions only need to handle bigint according to their respective semantics.

How do we tell the difference during function resolution that a function need TIME(us) instead of TIME(ms)?

If we wanted to enforce this more explicitly at the function registration level, another option would be to introduce a template precision parameter and define two separate types — MilliTimeType and MicroTimeType — similar to ShortDecimal and LongDecimal. However, the situation is slightly different here: ShortDecimal and LongDecimal coexist, whereas MilliTimeType and MicroTimeType would not be expected to coexist.

I’d be very happy to hear your insights on this, thanks.

Copy link
Contributor

@Yuhta Yuhta Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very dangerous to assume TIME(ms) and TIME(us) will not co-exists. Also for each function, the semantics of the input types should be clear on its own, not implied by the engine in which it is used. With Velox, it's possible to register and call a Spark function inside Presto.

Looks to me a separate type (e.g. TimeMicroPrecision) should be the safest way to go. An explicitly parameterized type also works, but adding a little risk and complexity given it will change the existing Time type. It has the advantage of extensibility though, for example if we ever want to support TIME(ns) in the future.

Copy link
Collaborator Author

@rui-mo rui-mo Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yuhta @PingLiuPing Thanks for your inputs. Just to confirm, would you prefer that I implement TimeMicroPrecision as a custom type in sparksql, or as a shared type in Type.h?
A custom type could follow the same approach I recently used for the Spark TimestampNTZ type (#16344), which is registered only for Spark.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes a separate type class is the safest way to go

@Yuhta Yuhta requested review from kgpai and mbasmanova February 20, 2026 19:52
@rui-mo rui-mo changed the title feat(spark): Add microsecond precision support for TimeType feat: Add TimeMicroPrecisionType Feb 24, 2026
Copy link
Collaborator

@PingLiuPing PingLiuPing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it makes sense to add a non template base class such as

class TimeTypeBase : public BigintType {
...
};

template
class TimeType : public TimeTypeBase {
...
};

@rui-mo
Copy link
Collaborator Author

rui-mo commented Feb 26, 2026

Do you think it makes sense to add a non template base class such as

class TimeTypeBase : public BigintType { ... };

template class TimeType : public TimeTypeBase { ... };

@PingLiuPing Thanks for your review. The structure of the current TimeType is shown below. From this, we can see that only getMin is not precision-aware, so introducing a TimeTypeBase without a template parameter offers little benefit. Additionally, from a design perspective, it seems more reasonable for TimeType<Precision> to serve as the common base for different time precisions. I’m glad to hear your insights on this. cc: @Yuhta

template <TimePrecision PRECISION>
class TimeType : public BigintType {
  // Precision-unaware function: getMin
  // Precision-aware simple functions: getMax, toString, getTimeToVarcharRowSize

  // Pure virtual methods: valueToString, valueToTime
};

class TimeMilliPrecisionType final
    : public TimeType<TimePrecision::kMilliseconds> {
}

class TimeMicroPrecisionType final
    : public TimeType<TimePrecision::kMicroseconds> {
}

@PingLiuPing
Copy link
Collaborator

@PingLiuPing Thanks for your review. The structure of the current TimeType is shown below. From this, we can see that only getMin is not precision-aware, so introducing a TimeTypeBase without a template parameter offers little benefit. Additionally, from a design perspective, it seems more reasonable for TimeType<Precision> to serve as the common base for different time precisions. I’m glad to hear your insights on this. cc: @Yuhta

template <TimePrecision PRECISION>
class TimeType : public BigintType {
  // Precision-unaware function: getMin
  // Precision-aware simple functions: getMax, toString, getTimeToVarcharRowSize

  // Pure virtual methods: valueToString, valueToTime
};

class TimeMilliPrecisionType final
    : public TimeType<TimePrecision::kMilliseconds> {
}

class TimeMicroPrecisionType final
    : public TimeType<TimePrecision::kMicroseconds> {
}

Yes, makes sense.

Copy link
Collaborator

@PingLiuPing PingLiuPing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both Presto and Spark represent the TIME type using a bigint. The key
difference is that Spark’s TimeType uses microsecond precision and is not
timezone-aware.

The names of the types are: TIME and TIME_MICRO. These clearly communicate "Spark’s TimeType uses microsecond precision" difference. However, "is not
timezone-aware" is not included. What does it mean for the TIME type to be timezone-aware? Is this a property of the type or functions / operators that use it?

@rui-mo rui-mo force-pushed the wip_time branch 3 times, most recently from b12f6c5 to db78bd6 Compare February 27, 2026 14:15
@rui-mo
Copy link
Collaborator Author

rui-mo commented Feb 27, 2026

The names of the types are: TIME and TIME_MICRO. These clearly communicate "Spark’s TimeType uses microsecond precision" difference. However, "is not timezone-aware" is not included. What does it mean for the TIME type to be timezone-aware? Is this a property of the type or functions / operators that use it?

@mbasmanova For example, the TimeType API below converts local time to UTC, which makes it unsuitable for Spark to use. And it is Spark cast and function’s responsibility to handle this properly and ensure that no time zone conversion is applied to the bigint value.

velox/velox/type/Type.cpp

Lines 1552 to 1555 in f760d54

int64_t TimeType::valueToTime(
const StringView& timeStr,
const tz::TimeZone* timeZone,
int64_t sessionStartTimeMs) const {

@mbasmanova
Copy link
Contributor

@rui-mo Thank you for clarifying.

The key difference is that Spark’s TimeType uses microsecond precision and is not timezone-aware.

Should this sentence from PR description be updated?

@mbasmanova mbasmanova changed the title feat: Add TimeMicroPrecisionType feat: Add TIME_MICRO type Feb 27, 2026
@mbasmanova
Copy link
Contributor

/claude-review

@github-actions
Copy link

✅ Claude Code Review

Requested by @mbasmanova


Positive Observations

  1. Well-structured Template Design: The template-based approach with TimePrecision enum is clean and extensible, making the code maintainable while avoiding code duplication.

  2. Consistent Singleton Pattern: The implementation follows Velox's existing patterns for type singletons, ensuring memory efficiency and proper type equality checking.

  3. Comprehensive Integration: The changes touch all necessary parts of the system - type system, expression evaluation, function registration, serialization, and documentation.

  4. Backward Compatibility: The changes maintain backward compatibility by keeping the existing TIME() function unchanged while adding TIME_MICRO().

  5. Proper Fuzzer Integration: The fuzzer utilities are correctly updated to use the new getMin()/getMax() methods instead of hardcoded constants.


Recommendation

DO NOT MERGE until the critical NYI methods in TimeMicroPrecisionType are properly implemented. The incomplete implementation makes the TIME_MICRO type unusable for core operations like string conversion and parsing, which would cause runtime crashes.

Once the critical issues are addressed, this is a solid architectural change that properly extends Velox's type system to support both Presto and Spark time semantics.


ℹ️ About this review

This review was generated by Claude Code.
It analyzed the PR diff and codebase to provide feedback.

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

To request another review, comment /claude-review on this PR.

@rui-mo
Copy link
Collaborator Author

rui-mo commented Feb 27, 2026

@mbasmanova I updated the PR description. Please have another look, thanks!

Comment on lines +135 to +137
TIME type represents time in milliseconds from midnight UTC. Thus min/max value can range from UTC-14:00 at 00:00:00 to UTC+14:00 at 23:59:59.999 modulo 24 hours.
TIME_MICRO type represents time in microseconds from midnight. Thus min/max value can range from 00:00:00.000000 to 23:59:59.999999 modulo 24 hours.
TIME and TIME_MICRO types are backed by BIGINT physical type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rui-mo This is still confusing to me. If the only difference between TIME and TIME_MICRO is precision than value range should not be different? Am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

24 hours in millisecond is kMillisInDay, and is kMillisInDay * 1000 microseconds. So the value ranges are not the same.

  /// Maximum valid time value based on precision. For milliseconds:
  /// 23:59:59.999 (86,399,999 ms). For microseconds: 23:59:59.999999
  /// (86,399,999,999 μs).
  int64_t getMax() const {
    return PRECISION == TimePrecision::kMilliseconds ? kMillisInDay - 1
                                                     : kMillisInDay * 1000 - 1;
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UTC-14:00 at 00:00:00 to UTC+14:00 at 23:59:59.999
00:00:00.000000 to 23:59:59.999999

The doc highlights the difference using UTC vs. no UTC... is this right? I'm confused by that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presto's TimeType represents a time in the local session timezone and the TimeType in Velox has provided timezone-adjustment API, so I assume that's why the doc highlights that it supports timezones range from UTC-14 to UTC+14.

https://github.com/prestodb/presto/blob/master/presto-common/src/main/java/com/facebook/presto/common/type/TimeType.java#L25-L26

I did not include a time zone–related description for TimeMicro because it currently does not support time zone adjustment. However, this difference is not due to the time type implementation itself, since neither type stores time zone information. The distinction lies in how the underlying bigint value is interpreted and whether time zone adjustment APIs are provided. Therefore, TimeMicro is not inherently time zone–unaware — that is simply Spark’s semantic choice. If TimeMicro were used by another operator that requires time zone handling, it could support adjustment through an API, while Spark’s casting and functions may choose not to use it.

Does this make sense? Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replied above, but Velox only supports legacy_timestamp = false semantics for TIME and takes into account timezone since time is not a point
Looking at our behavior for TIME, there are a couple of places where we are inconsistent (but still consistent with Presto ) which is when we convert TIME to TIMESTAMP and do not take into session time and timezone into account (we should, even if presto doesnt, since that was called out here : prestodb/presto#25957) .

Also @rui-mo does spark support TIME WITH TIMEZONE ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgpai Spark only supports TIME [WITHOUT TIMEZONE], which was introduced in one of the most recent major releases.

@rui-mo
Copy link
Collaborator Author

rui-mo commented Feb 27, 2026

DO NOT MERGE until the critical NYI methods in TimeMicroPrecisionType are properly implemented. The incomplete implementation makes the TIME_MICRO type unusable for core operations like string conversion and parsing, which would cause runtime crashes.

There is an ongoing PR to add support of CAST(string AS time) for Spark. We need to rebase that PR on top of this one and implement valueToTime there. The valueToString method will be introduced later as part of the support for CAST(time AS string).

@rui-mo rui-mo force-pushed the wip_time branch 2 times, most recently from 2f8e09f to 238859a Compare March 5, 2026 10:52
@rui-mo
Copy link
Collaborator Author

rui-mo commented Mar 5, 2026

Hi @mbasmanova, I’ve addressed all the comments above. Could you please take another look when you have a chance? Please let me know if any further changes are needed.

@mbasmanova
Copy link
Contributor

/claude-review

@@ -120,7 +120,7 @@ int32_t randDate(FuzzerGenerator& rng) {
}

int32_t randTime(FuzzerGenerator& rng) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need randTimeMicro?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used whenever Type::isTime() returns true, so used for both

Copy link
Collaborator Author

@rui-mo rui-mo Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the suggestion, I added a type check to ensure it is only called by the TIME type (see #16662). The work to extend the fuzzer tests for TimeMicroUtc is also tracked in #16660.

@github-actions
Copy link

github-actions bot commented Mar 5, 2026

✅ Claude Code Review

Requested by @mbasmanova


⚠️ Claude completed but produced no output.


ℹ️ About this response

This review was generated by Claude Code. It analyzed the PR diff and codebase to provide feedback.

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Available commands:

  • /claude-review - Request a code review
  • /claude-query <question> - Ask a question about the PR or codebase

Comment on lines +135 to +136
TIME type represents time in milliseconds from midnight UTC. Thus min/max value can range from UTC-14:00 at 00:00:00 to UTC+14:00 at 23:59:59.999 modulo 24 hours.
TIME_MICRO type represents time in microseconds from midnight. Thus min/max value can range from 00:00:00.000000 to 23:59:59.999999 modulo 24 hours.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still confusing.

The time-zone treatment is not specific to the type itself. It is how that type is used. Hence, both types represent same range (just different precision). Let's fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is default CAST operator somehow take time zone into account? What is the behavior of CAST for the new type?

CC: @kagamiori

Copy link
Contributor

@Yuhta Yuhta Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do consider time zone during cast by taking the date from a query config:
https://github.com/facebookincubator/velox/blob/main/velox/expression/CastExpr.cpp#L233
https://github.com/facebookincubator/velox/blob/main/velox/expression/CastExpr.cpp#L353

@rui-mo This is another aspect that we need to model in the new type. Can you adding another template parameter bool kLocalTime to the new TimeType and implement the following combinations in the new implementation:

  • TIME(ms, local) - Current Presto TIME
  • TIME(ms, utc)
  • TIME(us, local)
  • TIME(us, utc) - Spark TIME_MICRO

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the four time types mentioned above and updated the documentation. Currently, Presto time casting depends on the timezone, while the Spark implementation needs to support casting and functions based on the TIME_MICRO_UTC type, ensuring to use microseconds precision and time values are not affected by timezone.

});

auto result = evaluate("array_time_micro(c0)", data);
auto expected = makeArrayVector<int64_t>({{}, {}}, TIME_MICRO());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this test verify? Why expected results are empty arrays?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test verifies that function registration with TimeMicroUtc works correctly and ensures that its specific implementation is used when the function is called with TimeMicroUtc. The result was previously empty because nothing was appended in the implementation. To avoid confusion, it has been updated to return a different, non-empty result.

ASSERT_TRUE(timeType->isComparable());

testTypeSerde(timeType);
const auto timeMilli = TIME();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's split this test into 2.

Comment on lines +1254 to +1255
ASSERT_EQ(
TimePrecisionName::toName(TimePrecision::kMicroseconds), "MICROSECONDS");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we testing DEFINE_ENUM_NAME macros here? Not needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks.

Comment on lines +95 to +105
struct CppToType<Time> : public CppToTypeBase<TypeKind::BIGINT> {
static auto create() {
return TIME();
}
};

template <>
struct CppToType<TimeMicro> : public CppToTypeBase<TypeKind::BIGINT> {
static auto create() {
return TIME_MICRO();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CppToType is typically used for vector generation in tests. I assume it is necessary to ensure that the generated vectors can distinguish between different time types, rather than always being treated as BIGINT.

// The 'deserialize' of TimeMilliPrecisionType handles both millisecond and
// microsecond precision based on the 'precision' field in the serialized
// object. So we only need to register it once.
registry.Register("TimeType", TimeMilliPrecisionType::deserialize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TimeMilliPrecisionType - this name is specific, but it handles generic types? Something is off.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added TimeTypeFactory::deserialize to replace this, thanks!

/// When casting from TIME to varchar, the resultant varchar size depends on
/// precision: 12 bytes for milliseconds (HH:MM:SS.mmm), 15 bytes for
/// microseconds (HH:MM:SS.mmmmmm).
int32_t timeToVarcharRowSize() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a constant no? Each template has its own value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add constexpr on the return value. Same for getMin() and getMax().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to mark the return value as constexpr. Thanks.

/// - Presto: milliseconds since midnight (precision = kMilliseconds).
/// - Spark: microseconds since midnight (precision = kMicroseconds). Not
/// timezone aware.
template <TimePrecision PRECISION>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: kPrecision

VELOX_CONSTEXPR_SINGLETON TimeType kInstance;
return {std::shared_ptr<const TimeType>{}, &kInstance};
static_assert(
PRECISION == TimePrecision::kMilliseconds ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With enum this should be guaranteed, we can remove this

@@ -120,7 +120,7 @@ int32_t randDate(FuzzerGenerator& rng) {
}

int32_t randTime(FuzzerGenerator& rng) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used whenever Type::isTime() returns true, so used for both

/// When casting from TIME to varchar, the resultant varchar size depends on
/// precision: 12 bytes for milliseconds (HH:MM:SS.mmm), 15 bytes for
/// microseconds (HH:MM:SS.mmmmmm).
int32_t timeToVarcharRowSize() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add constexpr on the return value. Same for getMin() and getMax().

// static
template <TimePrecision PRECISION>
TypePtr TimeType<PRECISION>::deserialize(const folly::dynamic& obj) {
auto precision = static_cast<TimePrecision>(obj["precision"].asInt());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make precision optional in serialization to be backward compatible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated serialize and deserialize to make precision and isLocalTime optional. The backward compatibility of TIME remains unaffected. Thanks for this suggestion.

FOLLY_ALWAYS_INLINE bool Type::isTime() const {
// Pointer comparison works since this type is a singleton.
return (this == TIME().get());
return (this == TIME().get()) || (this == TIME_MICRO().get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a warm-up for the refactoring change, let's find out all the call sites of this function, and add VELOX_DCHECK_EQ(*type, *TIME()) to make sure we will not miss them once the new type flow through the system.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion is very helpful. I opened PR #16662 to prepare for the time type extension and tracked the remaining tasks in #16660. Thanks.

@mbasmanova
Copy link
Contributor

@rui-mo Rui, do you use Claude or some other model for coding? If not, consider giving it a try. It will make it easier and faster to iterate.

@kgpai
Copy link
Contributor

kgpai commented Mar 5, 2026

/claude-review

@github-actions
Copy link

github-actions bot commented Mar 5, 2026

✅ Claude Code Review

Requested by @kgpai


Summary

This pull request introduces a new TIME_MICRO type to support microsecond-precision time values alongside the existing millisecond-precision TIME type. The implementation uses templates to create separate concrete types (TimeMilliPrecisionType and TimeMicroPrecisionType) while maintaining backwards compatibility.

Issues Found

🔴 Critical: Template instantiation logic error in Type.cpp:1513-1521

File: velox/type/Type.cpp:1513-1521

The diff shows that serialization method calls on TimeMilliPrecisionType::serialize() but tries to move the toCompactIso8601 static method (which works with microseconds) to TimeMicroPrecisionType. However, in the original code at line 1607, TimeType::toCompactIso8601 is implemented but there's a mismatch:

  1. TimeMilliPrecisionType should handle millisecond values but toCompactIso8601 expects microseconds
  2. The method is moved to TimeMicroPrecisionType but this creates an inconsistency where millisecond time types can't format to ISO-8601

Suggested fix:

  • Keep toCompactIso8601 in TimeMilliPrecisionType but modify it to work with millisecond values, OR
  • Create separate formatting methods for each precision type with appropriate input scaling

🔴 Critical: Missing template explicit instantiation

File: velox/type/Type.cpp:1537-1538

The code shows explicit template instantiations:

template class TimeType<TimePrecision::kMilliseconds>;
template class TimeType<TimePrecision::kMicroseconds>;

But the template class definition uses PRECISION as a template parameter while the concrete classes inherit from the template. This creates potential linking issues since the base template methods need to be instantiated.

Suggested fix: Verify all template methods are properly instantiated and accessible from the concrete derived classes.

🟡 Suggestion: Inconsistent value validation in SimpleFunctionTest.cpp:1681-1688

File: velox/expression/tests/SimpleFunctionTest.cpp:1681-1688

The test logic for TimeMicroPrecisionType shows out = input / 1000 + 1; but this seems arbitrary and potentially incorrect:

  1. The division by 1000 suggests converting microseconds to milliseconds, but the logic adds 1 millisecond instead of 1 microsecond
  2. This makes the function behavior inconsistent between Time and TimeMicro versions
  3. The test uses input values {1000, 2000, 3000} with expected outputs {2, 3, 4} - this suggests the function is meant to add 1 millisecond to time values converted from microseconds

Suggested fix: Clarify the intended behavior - either make both functions add the same unit (1 microsecond for both) or document why they behave differently.

🟡 Suggestion: Potential integer overflow in max value calculation

File: velox/type/Type.h:1583-1586

The getMax() method calculates:

return PRECISION == TimePrecision::kMilliseconds ? kMillisInDay - 1
                                                 : kMillisInDay * 1000 - 1;

For microsecond precision: kMillisInDay * 1000 = 86400000 * 1000 = 86,400,000,000

While this fits in int64_t, the calculation could potentially cause issues if kMillisInDay were ever modified to a larger value.

Suggested fix: Use explicit casting or constants to make the intent clear: static_cast<int64_t>(kMillisInDay) * 1000 - 1

🟡 Suggestion: Missing boundary validation in valueToString methods

File: Type.cpp (changed) and existing TimeType methods**

The diff removes hardcoded constants like kMin and kMax from the original TimeType and replaces them with method calls getMin() and getMax(). However, in the valueToString method for TimeMilliPrecisionType, there's still a hardcoded check:

VELOX_USER_CHECK(
    !(value < 0 || value >= 86400000),  // Hardcoded instead of using getMax()
    "TIME value {} is out of range [0, 86400000)",
    value);

Suggested fix: Use getMax() consistently: value >= getMax() + 1

🟡 Suggestion: Missing TimeMicroPrecisionType valueToString implementation

File: velox/type/Type.h and velox/type/Type.cpp

The TimeMicroPrecisionType class doesn't provide its own valueToString method, but microsecond values need different formatting (15 characters vs 12 for millisecond values, and different precision). The timeToVarcharRowSize() method correctly returns 15 for microseconds, but no corresponding string formatting is implemented.

Suggested fix: Implement valueToString for TimeMicroPrecisionType that properly formats microsecond values to HH:MM:SS.mmmmmm format.

🟢 Nitpick: Inconsistent naming in test functions

File: velox/expression/tests/SimpleFunctionTest.cpp:1682 and 1717

Function names time_plus_one and time_micro_plus_one don't follow consistent Velox naming patterns. Other functions in the codebase tend to use more descriptive names.

Positive Observations

  1. Good template design: The template-based approach with TimePrecision enum allows code reuse while maintaining type safety
  2. Backwards compatibility: Existing TIME() function continues to work unchanged
  3. Comprehensive test coverage: Tests include both simple functions and complex types (arrays)
  4. Proper serialization/deserialization: The unified deserialization approach that handles both precisions based on the precision field is elegant
  5. Documentation updates: The types.rst documentation is updated appropriately to explain both TIME and TIME_MICRO

The core design is sound and addresses the requirements well. The main issues are implementation details that should be addressed before merging to ensure correctness and consistency.


ℹ️ About this response

This review was generated by Claude Code. It analyzed the PR diff and codebase to provide feedback.

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Available commands:

  • /claude-review - Request a code review
  • /claude-query <question> - Ask a question about the PR or codebase


TIME type represents time in milliseconds from midnight UTC. Thus min/max value can range from UTC-14:00 at 00:00:00 to UTC+14:00 at 23:59:59.999 modulo 24 hours.
TIME type is backed by BIGINT physical type.
TIME type represents time in milliseconds from midnight UTC. Thus min/max value can range from UTC-14:00 at 00:00:00 to UTC+14:00 at 23:59:59.999 modulo 24 hours.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will edit this , but Time also sticks to 0 to 24hours - 1 . Presto behavior is not fully consistent because TIME behavior in presto changes based on if legacy_timestamp=true. legacy_timestamp=true is not supported in Velox since it leads to problems during DST. TIME in velox will always be 0 to 24 hours -1 . TIME with TIMEZONE in velox also will not support IANA names. Theres a discussion here : prestodb/presto#25957 (comment) . This documentation was not subsequently updated unfortunately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to drop adjust_timestamp_to_session_timezone query config or assert it is always false in velox? Then we can drop the time zone aspect of this type

// static
template <TimePrecision PRECISION>
TypePtr TimeType<PRECISION>::deserialize(const folly::dynamic& obj) {
auto precision = static_cast<TimePrecision>(obj["precision"].asInt());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.

template <TimePrecision PRECISION>
TypePtr TimeType<PRECISION>::deserialize(const folly::dynamic& obj) {
auto precision = static_cast<TimePrecision>(obj["precision"].asInt());
if (precision == TimePrecision::kMilliseconds) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Like pointed above by default return TIME unless precision is set and has value microseconds.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks.

VELOX_DECLARE_ENUM_NAME(TimePrecision);

/// Base template class for TIME types with configurable precision.
/// - Presto: milliseconds since midnight (precision = kMilliseconds).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI on Presto behavior.

Cast Respects Timezone? Location
CAST(TIME AS VARCHAR) Yes CastExpr.cpp:217-280
CAST(VARCHAR AS TIME) Yes CastExpr.cpp:339-405
CAST(TIMESTAMP AS TIME) No CastExpr.cpp:379-400
CAST(TIME AS TIMESTAMP) No (This is a bug , issue: #16649 ) CastExpr.cpp:307-331
CAST(TIME AS BIGINT) No Direct int64 copy
CAST(TIME AS TIMESTAMP WITH TZ) Always uses TZ TimestampWithTimeZoneRegistration.cpp:288-331
CAST(TIME AS TIME WITH TZ) Always uses TZ TimeWithTimezoneRegistration.cpp:196-230

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @Yuhta

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first 2 are only true when adjust_timestamp_to_session_timezone=true right? And you said it's always set to false in Velox today?

Copy link
Contributor

@kgpai kgpai Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight correction : adjust_timestamp_to_session_timezone behavior is respected in first two but ignored in other cases . I think we should ignore the behavior in the first two cases also and always use TZ.
Currently in our clusters adjust_timestamp_to_session_timezone is always set to true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok then which means the TZ aspect should be modeled in the new type, as Spark is not TZ aware.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on previous suggestions, I'm adding the four types: TIME, TIME_MILLI_UTC, TIME_MICRO and TIME_MICRO_UTC. Spark will use TIME_MICRO_UTC and Presto will keep using TIME. Does this makes sense? Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rui-mo , I am not aware of any place that will require TIME_MILLI_UTC or TIME_MICRO (feel free to correct me if that is the case), so having just TIME, and TIME_MICRO_UTC would suffice right ?
@Yuhta do you think we need all 4 types ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don’t use the other two types. I assume the suggestion was meant to make things clearer. Hope @Yuhta could help clarify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we only need all 4 in the template parameter and have proper checks on use sites the precision and TW-aware are correct. TIME_MILLI_UTC and TIME_MICRO is not used in any engine (yet) so we can ignore them during function dispatch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the build side, we have a goal of brining our build/tests time under 45 minutes, Type.h is a very fundamental header that has a lot of fan out and every little reduction in build time helps; In this case cant we keep it simple and not need this template parameterization as there are really only two types required (TIME and TIME_MICRO_UTC) and can be modelled independently ?

@rui-mo rui-mo changed the title feat: Add TIME_MICRO type feat: Add TIME_MICRO_UTC type Mar 6, 2026
@rui-mo
Copy link
Collaborator Author

rui-mo commented Mar 6, 2026

Responses to Claude’s code review:

  1. toCompactIso8601 was moved intentionally because it handles int64_t as microseconds.
  2. Missing template explicit instantiation: these are not missing.
  3. Inconsistent value validation: intentional, to ensure calls with TimeMicro invoke its specific implementation.
  4. Fixed the potential integer overflow in the max value calculation.
  5. Updated to remove the hardcoded check.
  6. Missing TimeMicroPrecisionType valueToString implementation: will be added when implementing the specific Spark cast, keeping this PR concise.

@rui-mo
Copy link
Collaborator Author

rui-mo commented Mar 6, 2026

Thank you for the insightful review! I’ve addressed all the comments. Please have another look when you get a chance.


TIME type represents time in milliseconds from midnight UTC. Thus min/max value can range from UTC-14:00 at 00:00:00 to UTC+14:00 at 23:59:59.999 modulo 24 hours.
TIME type is backed by BIGINT physical type.
TIME type represents time in milliseconds of local timezone from midnight. Thus min/max value can range from UTC-14:00 at 00:00:00 to UTC+14:00 at 23:59:59.999 modulo 24 hours.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you are making this change here , can we change this to min/max value can range from 0 to 23:59:59.999 ? For TIME we do not support the older deprecated Presto TIME semantics of the value ranging from UTC-14:00 to UTC+14:00.

Copy link
Contributor

@kgpai kgpai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rui-mo Looks good to me, my one concern is that this additional templatization esp because we dont use it is something we will continue paying for in terms of build time in future. I would rather avoid it. I dont have any numbers on how expensive this is, but I am working on some changes to measure builds and optimize our targets and hope to use that measure impact of such changes.

cc: @Yuhta unless we really require it imo, we shouldnt have the unused types and can probably get rid of templatization.

/// When casting from TIME to varchar, the resultant varchar size depends on
/// precision: 12 bytes for milliseconds (HH:MM:SS.mmm), 15 bytes for
/// microseconds (HH:MM:SS.mmmmmm).
constexpr int32_t timeToVarcharRowSize() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: size_t instead of int32_t

static TypePtr deserialize(const folly::dynamic& /*obj*/) {
return TimeType::get();
// TimeType with millisecond precision, timezone unaware. Not used yet.
class TimeMilliPrecisionUtcType final
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to create this type when it has no use, we will pay for it in build time at every place that type.h is included in (and it is included in many places)

};

// TimeType with microsecond precision in local timezone. Not used yet.
class TimeMicroPrecisionType final
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@Yuhta
Copy link
Contributor

Yuhta commented Mar 6, 2026

@kgpai I would say we keep the template as it make things very clear. We can omit the unused subclasses. Just template without instantiation should not affect build time in any significant way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants