Skip to content

Commit 0d6a03e

Browse files
authored
Merge pull request #231 from r-lib/fix/second-parsing
More correctly document `%NS` and invalid coarse parsing behavior
2 parents 220e707 + 01c9f3f commit 0d6a03e

16 files changed

+334
-137
lines changed

NEWS.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,33 @@
11
# clock (development version)
22

3+
* Parsing into a date-time type that is coarser than the original string is now
4+
considered ambiguous and undefined behavior. For example, parsing a string
5+
with fractional seconds using `date_time_parse(x)` or
6+
`naive_time_parse(x, precision = "second")` is no longer considered correct.
7+
Instead, if you only require second precision from such a string, parse the
8+
full string, with fractional seconds, into a clock type that can handle them,
9+
then round to seconds using whatever rounding convention is required for your
10+
use case, such as `time_point_floor()` (#230).
11+
12+
For example:
13+
14+
```
15+
x <- c("2019-01-01 00:00:59.123", "2019-01-01 00:00:59.556")
16+
17+
x <- naive_time_parse(x, precision = "millisecond")
18+
x
19+
#> <time_point<naive><millisecond>[2]>
20+
#> [1] "2019-01-01 00:00:59.123" "2019-01-01 00:00:59.556"
21+
22+
x <- time_point_round(x, "second")
23+
x
24+
#> <time_point<naive><second>[2]>
25+
#> [1] "2019-01-01 00:00:59" "2019-01-01 00:01:00"
26+
27+
as_date_time(x, "America/New_York")
28+
#> [1] "2019-01-01 00:00:59 EST" "2019-01-01 00:01:00 EST"
29+
```
30+
331
# clock 0.3.0
432

533
* New `date_seq()` for generating date and date-time sequences (#218).

R/date.R

Lines changed: 15 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -983,9 +983,11 @@ date_set_zone.Date <- function(x, zone) {
983983
#' _`date_parse()` ignores both the `%z` and `%Z` commands,_ as clock treats
984984
#' Date as a _naive_ type, with a yet-to-be-specified time zone.
985985
#'
986-
#' If parsing a string with sub-daily components, such as hours, minutes or
987-
#' seconds, note that the conversion to Date will round those components to
988-
#' the nearest day. See the examples for a way to control this.
986+
#' Parsing strings with sub-daily components, such as hours, minutes, or
987+
#' seconds, should be done with [date_time_parse()]. If you only need the date
988+
#' components, round the result to day precision, and then use [as_date()].
989+
#' Attempting to directly parse a sub-daily string into a Date is ambiguous and
990+
#' undefined, and is unlikely to work as you might expect.
989991
#'
990992
#' @inheritParams zoned-parsing
991993
#'
@@ -1012,27 +1014,21 @@ date_set_zone.Date <- function(x, zone) {
10121014
#' date_parse("2020-W01-2", format = "%G-W%V-%u")
10131015
#'
10141016
#' # ---------------------------------------------------------------------------
1015-
#' # Rounding of sub-daily components
1017+
#' # Sub-daily components
10161018
#'
1017-
#' # Note that rounding a string with time components will round them to the
1018-
#' # nearest day if you try and parse them
1019+
#' # If you have a string with sub-daily components, but only require the date,
1020+
#' # first parse them as date-times to fully parse the sub-daily components,
1021+
#' # then round using whatever convention is required for your use case before
1022+
#' # converting to date.
10191023
#' x <- c("2019-01-01 11", "2019-01-01 12")
10201024
#'
1021-
#' # Hour 12 rounds up to the next day
1022-
#' date_parse(x, format = "%Y-%m-%d %H")
1023-
#'
1024-
#' # If you don't like this, one option is to just not parse the time component
1025-
#' date_parse(x, format = "%Y-%m-%d")
1026-
#'
1027-
#' # A more general option is to parse the full string as a naive-time,
1028-
#' # then round manually
1029-
#' nt <- naive_time_parse(x, format = "%Y-%m-%d %H", precision = "hour")
1030-
#' nt
1025+
#' x <- date_time_parse(x, zone = "UTC", format = "%Y-%m-%d %H")
1026+
#' x
10311027
#'
1032-
#' nt <- time_point_floor(nt, "day")
1033-
#' nt
1028+
#' date_floor(x, "day")
1029+
#' date_round(x, "day")
10341030
#'
1035-
#' as.Date(nt)
1031+
#' as_date(date_round(x, "day"))
10361032
date_parse <- function(x, ..., format = NULL, locale = clock_locale()) {
10371033
x <- naive_time_parse(x, ..., format = format, precision = "day", locale = locale)
10381034
as.Date(x)

R/gregorian-year-month-day.R

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,8 @@ vec_ptype_abbr.clock_year_month_day <- function(x, ...) {
171171
#' @details
172172
#' `year_month_day_parse()` completely ignores the `%z` and `%Z` commands.
173173
#'
174+
#' @inheritSection zoned-parsing Full Precision Parsing
175+
#'
174176
#' @inheritParams zoned-parsing
175177
#'
176178
#' @param x `[character]`

R/naive-time.R

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,8 @@ is_naive_time <- function(x) {
6161
#' If your date-time strings contain a UTC offset, but not a full time zone
6262
#' name, use [sys_time_parse()].
6363
#'
64+
#' @inheritSection zoned-parsing Full Precision Parsing
65+
#'
6466
#' @inheritParams sys_time_parse
6567
#'
6668
#' @return A naive-time.
@@ -80,6 +82,29 @@ is_naive_time <- function(x) {
8082
#' "2020-01-01 -4000 America/New_York",
8183
#' format = "%Y-%m-%d %z %Z"
8284
#' )
85+
#'
86+
#' # ---------------------------------------------------------------------------
87+
#' # Fractional seconds and POSIXct
88+
#'
89+
#' # If you have a string with fractional seconds and want to convert it to
90+
#' # a POSIXct, remember that clock treats POSIXct as a second precision type.
91+
#' # Ideally, you'd use a clock type that can support fractional seconds, but
92+
#' # if you really want to parse it into a POSIXct, the correct way to do so
93+
#' # is to parse the full fractional time point with the correct `precision`,
94+
#' # then round to seconds using whatever convention you require, and finally
95+
#' # convert that to POSIXct.
96+
#' x <- c("2020-01-01 00:00:00.123", "2020-01-01 00:00:00.555")
97+
#'
98+
#' # First, parse string with full precision
99+
#' x <- naive_time_parse(x, precision = "millisecond")
100+
#' x
101+
#'
102+
#' # Then round to second with a floor, ceiling, or round to nearest
103+
#' time_point_floor(x, "second")
104+
#' time_point_round(x, "second")
105+
#'
106+
#' # Finally, convert to POSIXct
107+
#' as_date_time(time_point_round(x, "second"), zone = "UTC")
83108
naive_time_parse <- function(x,
84109
...,
85110
format = NULL,

R/posixt.R

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1164,9 +1164,16 @@ date_set_zone.POSIXt <- function(x, zone) {
11641164
#' `NA`s, or completely fails to parse, then no time zone will be able to be
11651165
#' determined. In that case, the result will use `"UTC"`.
11661166
#'
1167-
#' If manually parsing sub-second components, be aware that they will be
1168-
#' automatically rounded to the nearest second when converting them to POSIXct.
1169-
#' See the examples for a way to control this.
1167+
#' If you have strings with sub-second components, then these date-time parsers
1168+
#' are not appropriate for you. Remember that clock treats POSIXct as a second
1169+
#' precision type, so parsing a string with fractional seconds directly into a
1170+
#' POSIXct is ambiguous and undefined. Instead, fully parse the string,
1171+
#' including its fractional seconds, into a clock type that can handle it, such
1172+
#' as a naive-time with [naive_time_parse()], then round to seconds with
1173+
#' whatever rounding convention is appropriate for your use case, such as
1174+
#' [time_point_floor()], and finally convert that to POSIXct with
1175+
#' [as_date_time()]. This gives you complete control over how the fractional
1176+
#' seconds are handled when converting to POSIXct.
11701177
#'
11711178
#' @inheritParams zoned-parsing
11721179
#' @inheritParams as-zoned-time-naive-time
@@ -1223,28 +1230,21 @@ date_set_zone.POSIXt <- function(x, zone) {
12231230
#' date_time_parse_abbrev(abbrev_times, "America/New_York")
12241231
#'
12251232
#' # ---------------------------------------------------------------------------
1226-
#' # Rounding of sub-second components
1233+
#' # Sub-second components
12271234
#'
1228-
#' # Generally, if you have a string with sub-second components, they will
1229-
#' # be ignored when parsing into a date-time
1230-
#' x <- c("2019-01-01 00:00:01.1", "2019-01-01 00:00:01.7")
1235+
#' # If you have a string with sub-second components, but only require up to
1236+
#' # seconds, first parse them into a clock type that can handle sub-seconds to
1237+
#' # fully capture that information, then round using whatever convention is
1238+
#' # required for your use case before converting to a date-time.
1239+
#' x <- c("2019-01-01 00:00:01.1", "2019-01-01 00:00:01.78")
12311240
#'
1232-
#' date_time_parse(x, "America/New_York")
1233-
#'
1234-
#' # If you manually try and parse those sub-second components with `%4S` to
1235-
#' # read the 2 seconds, 1 decimal point, and 1 fractional component, the
1236-
#' # fractional component will be rounded to the nearest second
1237-
#' date_time_parse(x, "America/New_York", format = "%Y-%m-%d %H:%M:%4S")
1238-
#'
1239-
#' # If you don't like this, parse the full string as a naive-time,
1240-
#' # then round manually and convert to a POSIXct
1241-
#' nt <- naive_time_parse(x, format = "%Y-%m-%d %H:%M:%S", precision = "millisecond")
1242-
#' nt
1241+
#' x <- naive_time_parse(x, precision = "millisecond")
1242+
#' x
12431243
#'
1244-
#' nt <- time_point_floor(nt, "second")
1245-
#' nt
1244+
#' time_point_floor(x, "second")
1245+
#' time_point_round(x, "second")
12461246
#'
1247-
#' as.POSIXct(nt, "America/New_York")
1247+
#' as_date_time(time_point_round(x, "second"), "America/New_York")
12481248
NULL
12491249

12501250
#' @rdname date-time-parse

R/sys-time.R

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,8 @@ is_sys_time <- function(x) {
6969
#' consider using [naive_time_parse()], since the resulting naive-time doesn't
7070
#' come with an assumption of a UTC time zone.
7171
#'
72+
#' @inheritSection zoned-parsing Full Precision Parsing
73+
#'
7274
#' @inheritParams zoned-parsing
7375
#'
7476
#' @param precision `[character(1)]`

R/zoned-time.R

Lines changed: 21 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -300,6 +300,16 @@ zoned_time_format <- function(print_zone_name) {
300300
#' know the time zone that the date-times are supposed to be in, you can convert
301301
#' to a zoned-time with [as_zoned_time()].
302302
#'
303+
#' @section Full Precision Parsing:
304+
#'
305+
#' It is highly recommended to parse all of the information in the date-time
306+
#' string into a type at least as precise as the string. For example, if your
307+
#' string has fractional seconds, but you only require seconds, specify a
308+
#' sub-second `precision`, then round to seconds manually using whatever
309+
#' convention is appropriate for your use case. Parsing such a string directly
310+
#' into a second precision result is ambiguous and undefined, and is unlikely to
311+
#' work as you might expect.
312+
#'
303313
#' @inheritParams ellipsis::dots_empty
304314
#'
305315
#' @param x `[character]`
@@ -434,12 +444,17 @@ zoned_time_format <- function(print_zone_name) {
434444
#' characters to read. If not specified, the default is `2`. Leading zeroes
435445
#' are permitted but not required.
436446
#'
437-
#' - `%S`: The seconds as a decimal number. The modified command `%NS` where
438-
#' `N` is a positive decimal integer specifies the maximum number of
439-
#' characters to read. If not specified, the default is determined by the
440-
#' precision that you are parsing at. If encountered, the `locale`
441-
#' determines the decimal point character. Leading zeroes are permitted but
442-
#' not required.
447+
#' - `%S`: The seconds as a decimal number. Leading zeroes are permitted but
448+
#' not required. If encountered, the `locale` determines the decimal point
449+
#' character. Generally, the maximum number of characters to read is
450+
#' determined by the precision that you are parsing at. For example, a
451+
#' precision of `"second"` would read a maximum of 2 characters, while a
452+
#' precision of `"millisecond"` would read a maximum of 6 (2 for the values
453+
#' before the decimal point, 1 for the decimal point, and 3 for the values
454+
#' after it). The modified command `%NS`, where `N` is a positive decimal
455+
#' integer, can be used to exactly specify the maximum number of characters to
456+
#' read. This is only useful if you happen to have seconds with more than 1
457+
#' leading zero.
443458
#'
444459
#' - `%p`: The `locale`'s equivalent of the AM/PM designations associated with
445460
#' a 12-hour clock. The command `%I` must precede `%p` in the format string.

man/date-time-parse.Rd

Lines changed: 32 additions & 27 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)