-
Notifications
You must be signed in to change notification settings - Fork 96
ISO 8601
How things might possibly look if we used ISO 8601:2000 formatted date/times, durations, and recurrence syntax in cylc.
ISO 8601 is not just about writing dates and times in a universal format - it's also about writing durations and recurring intervals of time. These are commonly used in cylc.
We have quite a few issues that hang on expressing when exactly tasks run - so if we're looking at ISO 8601, we might as well explore if it can help us with these issues.
Scroll to the bottom to see some examples.
See #119 for the ISO 8601 cylc issue
See #119 comment for a good ISO 8601 summary by @m214089.
See ISO 8601:2004 for the ISO 8601 2004 version.
For example:
[[dependencies]]
[[[6,12]]]
The examples use an initial cycle time
of 20130325
and a final cycle time of 20130405T12
.
Purpose | Current | Proposed | Expanded | Notes | In Context |
---|---|---|---|---|---|
Repeat every day starting at 06:00 after initial cycle time | 6 |
T06 |
R/T06/P1D |
implicit P1D
|
R/20130325T06/P1D |
As above, also at 12:00 | 6, 12 |
T06, T12 |
R/T06/P1D , R/T12/P1D
|
as above |
R/20130325T06/P1D , R/20130325T12/P1D
|
Repeat every month on the first of the month after the initial cycle time | Monthly(...) |
01T |
R/01T/P1M |
implicit P1M
|
R/20130401/P1M |
Repeat every month on the 14th of the month after the initial cycle time | Monthly(...) |
14T |
R/14T/P1M |
as above | R/20130414/P1M |
Repeat every year on April 2nd after the initial cycle time | Yearly(...) |
0402T |
R/0402T/P1Y |
implicit P1Y
|
R/20130402/P1Y |
Repeat every 2 days, starting at 06:00 after initial cycle time | Daily(...) |
T06/P2D |
R/T06/P2D |
- | R/20130325T06/P2D |
Repeat hourly, starting at 18:30 after initial cycle time | N/A | T1830/P1H |
R/T1830/P1H |
- | R/20130325T1830/P1H |
Repeat hourly, starting at the initial cycle time | 0,1,2,3,4.... |
P1H |
R//P1H |
implicit initial cycle time | R/20130325/P1H |
Repeat every 5 minutes, starting at the initial cycle time | N/A | PT5M |
R//PT5M |
as above | R/20130325/PT5M |
Repeat weekly, starting 1 day after initial cycle time | N/A | +P1D/P1W |
R/+P1D/P1W |
+/-P?? is an offset |
R/20130326/P1W |
Repeat every day, three times, starting at 12:00 after the initial cycle time | N/A | R3/T12/P1D |
R3/T12/P1D |
- | R3/2013032512/P1D |
Repeat once, at the 1st of the month after the initial cycle time | N/A | R1/01T |
R1/01T/P?? |
no duration necessary | R1/20130401/P?? |
Repeat once, at this (initial cycle time) date |
initial ? |
R1 |
R1/??/P?? |
implied initial cycle time | R1/20130325/P?? |
[cylc]
UTC mode = True
[scheduling]
initial cycle time = 2013032500
final cycle time = 2013040412
[[special tasks]]
cold-start = a_cold
[[dependencies]]
[[[ 6 ]]]
graph = "a[T-24] | a_cold => a"
[[[ 0, 12 ]]]
graph = "b => c"
[[[ 0, 6, 12, 18 ]]]
graph = "d[T-6] | d_cold => d"
[cylc]
UTC mode = True
[scheduling]
initial cycle time = 20130325
final cycle time = 20130404T12
[[special tasks]]
cold-start = a_cold, d_cold
[[dependencies]]
[[[ T06 ]]] # 1
graph = "a[-1D] | a_cold => a"
[[[ T06, T12 ]]]
graph = "b => c"
[[[ T00, T06, T12, T18 ]]]
graph = "d[-6H] | d_cold => d"
Notes:
-
- =
R/T06/P1D
which impliesR/20130325T06/P1D
, repeat every day at 06:00.
- =
[cylc]
UTC mode = True
[scheduling]
initial cycle time = 20130325
final cycle time = 20130404T12
[[dependencies]]
[[[ R1/T06 ]]] # 1
graph = """
a_cold => a
d_cold => d
"""
[[[ T06 ]]]
graph = "a[-1D] => a"
[[[ T06, T12 ]]]
graph = "b => c"
[[[ T06/P6H ]]] # 2
graph = "d[-6H] => d"
-
- Repeat once, first occurrence of
T06H
, impliesR1/20130325T06/P??
(undefined period of repetition).
- Repeat once, first occurrence of
-
- Repeat every 6 hours, starting at 06:00 (
R/20130325T06/P6H
).
- Repeat every 6 hours, starting at 06:00 (
We expect all new syntax to be distinguishable from the current syntax (e.g. T06
rather than 6
). This means a special suite.rc
flag isn't required (e.g. ISO mode = True
).
However, as cycle time conventions will change for output files and directories, we expect there may be a requirement in the short term to keep the old convention by e.g. having cycle time format = YYYYMMDDhhmm
or old cycle time mode = True
...
This proposes writing suite times, durations, and recurrences in an abbreviated form of the ISO 8601 format (also allowing the formal syntax). We use the context for an abbreviated element to generate the formal syntax.
Cycle times (as written in the suite.rc, such as initial cycle time
and final cycle time
) are date/times that specify the context date/times for some non-absolute times and recurring time intervals in the suite.
They should be specified using the ISO 8601 Date/Time format in basic, compact form.
Dates and times, in the basic, compact format that we want to use, are represented like this:
- Normal year/month/day dates, no times, (standard and expanded representation):
YYYYMMDD e.g. 20131225
YYYY e.g. 2013
YY e.g. 20 (century) implying 2000.
+YYYYYYMMDD, where extra Y stands for an agreed number of extra years e.g. +0020131225
We should agree to use two extra year digits. The plus is required.
-YYYYYYMMDD, e.g. -0000550101 (56 BC)
+YYYYYY e.g. +002013
-YYYYYY e.g. -000055
+YYYY e.g. +0020 (century) implying 2000.
-YYYY e.g. -0000 (century) implying -0000.
-
Week number and day number may also be specified.
-
It is possible to specify omission of components using hyphens - see the standard for details.
-
Times:
hhmmss e.g. 061000 (ten past 6 in the morning).
-
Decimal fractions are allowed, with a comma (recommended) or full stop as the decimal separator.
-
Omitted trailing components assumed zero.
-
Dates and times:
YYYYMMDDThhmmss
YYYYMMDDThhmmssZ
YYYYMMDDThhmmss+hhmm
YYYYMMDDThhmmss-hhmm
(same with +YYYYYY and -YYYYYY)
If UTC mode = True
, dates and times are implicitly Zulu time (Z
). Otherwise, if Z
is not specified in the initial cycle time, the local timezone will be used for it. The same applies independently for the final cycle time.
initial cycle time
and final cycle time
may use any of the above formatting. We would suggest that most suites use the 4-digit year specification and that the +/- 6 digit year form should be used only if required.
Sub-minute information will not be considered (will be set to zero) and may be flagged as an error on suite validation.
Decimal fractions are allowed in the ISO standard, but given that we don't support sub-minute information, this is unnecessary and we won't support it in cylc.
These should use the same form as initial cycle time
and final cycle time
, but there is a debate about the precision of the format. There are (at least) 4 options:
- Complete precision, plus/minus 6 digit year down to minute level (
+YYYYYYMMDDThhmm
or-YYYYYYMMDDThhmm
) - Complete precision, 4 or plus/minus 6 digit year down to minute level (depending on usage).
- Dynamic precision, minimum necessary to uniquely represent all cycle times.
- User-configured minimum precision (warn if the user does not use enough precision) - e.g. climate suites may want to just use
YYYYMM
.
The decision was made to go for year-dynamic output and task label formats down to the minute level, [EDIT:used to be 'without'] with timezone information.
Section names specify a recurrence or series, not a single date/time.
- This is true for both the proposed and current cylc format. In the current syntax:
[[[ 6 ]]]
means "every day at 0600", not "0600". The Yearly()
type of syntax is more explicit about this.
- We use either the full or an abbreviated form of the ISO 8601 format for recurring times
- Abbreviations are extrapolated given the context and some simple rules below.
- Power usage can make use of the full ISO 8601 recurrence syntax.
The ISO 8601 standard allows for four ways of specifying a recurring series of date/times. The way best suited for abbreviation in this context is the third:
Rn/START_TIME/PERIOD
which looks something like this:
Rn/YYYYMMDDThhmmss/PnYnMnDTnHnMnS
- R is a character that marks this as a recurring pattern. The first n is an optional number that limits the number of recurrences.
-
R1
implies a single repetition -
R5
implies 5 repetitions -
YYYYMMDDThhmmss
is a start date/time expressed in standard ISO 8601 format . ISO 8601 allows truncation of the time to, for example,Thh
if the parties agree on the context (our context start time is the initial cycle time - more later). -
01T
implies00:00:00
on the first day of a month (exactly which month depends on the context) -
T06
implies 6 a.m. -
20
implies the start of the century beginning with20
i.e.20000101T000000
-
PnYnMnDTnHnMnS
is the ISO 8601 duration syntax. TheP
character marks this as a duration. This is followed by (optionally specified) sets of numbers (n
) and units:nY
for number of years,nM
for number of months, and so on. If a set of number and unit is missing, it is assumed zero. For example, -
PT6H
is a 6 hour duration. -
P1Y6M
is a 1 year + 6 months duration -
PT6M
is a 6 minute duration -
P6M
is a 6 month duration - Using weeks is also allowed e.g.
P3W
for every 3 weeks.
We should also support the 1st and 4th ISO 8601 recurrence syntax specifications:
Rn/START_TIME/END_TIME
(number is not optional) - e.g. R6/20130401/20131001
implies repeat 6 times between the start and end times, which in this case is every month.
Rn/PERIOD/END_TIME
- e.g. R2/P1D/20130415
implies repeat twice every day counting backwards from the end time, which means dates of 20130414
and 20130415
.
Multiple, comma-separated recurrence specifications may be used within a single section (as with the current format).
A date/time followed by a forward slash (also called a solidus) and an ISO 8601 period (duration syntax) implies a leading R/
in ISO 8601 form. This means that T00/P2D
implies R/T00/P2D
.
A date/time without a period (e.g. T06
) showing a particular set of date/time components implies a period that is equal to 1 of the next largest date/time component.
-
T06
impliesR/T06/P1D
(1 day period). -
01T
(1st day in a month) would implyR/01T/P1M
(1 month period). -
-85
(the year 85 in a century) would implyR/-85/P100Y
(a century period). -
20
(a century in ISO 8601, implying 2000) or2000
is invalid without a period, as there is no larger component.
Within the full syntax and the abbreviations, truncated start times (e.g. T06
) imply the next such time immediately following (and including) the initial cycle time.
- For example:
[[[ T06 ]]]
graph = "a"
implies:
[[[ R/T06/P1D ]]]
graph = "a"
which, with an initial cycle time of 20130325T00
, implies R/20130325T06/P1D
. The same expression given an initial cycle time of 20130325T12
would imply R/20130326T06/P1D
, the day afterwards. An initial cycle time of 20130325T06
would imply R/20130325T06/P1D
, equal to the initial cycle time.
When using the full fourth ISO 8601 recurring format definition (rare, not explained here), truncated end times similarly imply the final cycle time.
A null string for the start time implies the initial cycle time:
[[[ R//P1D ]]]
graph = "a"
If we have an initial cycle time
of 20130325T12
, this implies R/20130325T12/P1D
- repeat every day starting at the initial cycle time
. When using the 4th ISO recurrence definition (R/PERIOD/END_TIME
), a null string for the end time implies the final cycle time
- e.g. R4/P1D
implies repeat every day counting backwards from the final cycle time
.
Allowing a null string for the start time implies that R//P1D
can be abbreviated as P1D
- an expression beginning with 'P' will be parsed as a duration and imply a recurring series of times starting at the initial cycle time
. We will allow the T
designator to be absent when the context is clear (it is necessary for distinguishing months and minutes). This means that P6H
implies R//PT6H
(a null start time), which for an initial cycle time of 20130325T00
implies R/20130325T00/PT6H
. This means that dependencies that used to be written as 0, 6, 12, 18
can be written as P6H
, if the initial cycle time
begins at one of 00:00, 06:00, 12:00, or 18:00.
The ISO 8601 format supports writing the number of repetitions to be specified - R1/T06/P1D
implies that this is only run once at T06 immediately following (or at) the initial cycle time. This has applications for cold start tasks. Writing just R1
must imply R1//P???
(undefined duration, any number) which, for an initial cycle time of 20130325T00
implies R1/20130325T00/P????
which is a task that runs exactly at the initial cycle time and not again.
We may want to specify our own way of writing a date/time plus a duration (period) as a way of writing an offset date/time. This should be written as a date/time followed by a positive or negative duration (sign is compulsory). In absolute form with a 4-digit year, it would look like this:
YYYYMMDDThhmm+PnYnMnDnHnM
YYYYMMDDThhmm-PnYnMnDnHnM
The above rules for writing abbreviated dates and times would then apply, so we could have:
Thhmm+PnYnMnDnHnM
For example:
T06+P1D
in the context of:
R/T06+P1D
which implies:
R/T06+P1D/P1D
means "repeat every day at 0600 immediately following (or at) the initial cycle time plus 1 day". An initial cycle time
of 20130401
would then imply repeating at:
20130402T06
20130403T06
20130404T06
...
and so on.
The use of the plus/minus sign in the duration implies that we can distinguish a duration from a null time string with an offset, so:
R/+P2D/P1D
means repeat every day, starting two days after the initial cycle time
(our usual R/START_TIME/PERIOD
recurrence syntax). For an initial cycle time
of 20130401
, this implies R/20130401+P2D/P1D
which implies R/20130403/P1D
. Similarly,
R/P2D/+P1D
is of the form R/PERIOD/END_TIME
which uses the 4th ISO recurrence syntax definition and implies repeat every 2 days counting backwards from 1 day after the final cycle time
. For a final cycle time
of 20130515
, this implies R/P2D/20130515+P1D
which implies R/P2D/20130516
.
Tasks can refer to tasks from other cycle times. They therefore need to specify which time this is, whether an absolute time or relative to their own cycle time.
(Copied from the Dependencies Sections text)
PnYnMnDTnHnMnS
is the ISO 8601 duration syntax. The P
character marks this as a duration. This is followed by (optionally specified) sets of numbers (n
) and units: nY
for number of years, nM
for number of months, and so on. If a set of number and unit is missing, it is assumed zero. For example,
-
PT6H
is a 6 hour duration. -
P1Y6M
is a 1 year + 6 months duration -
PT6M
is a 6 minute duration -
P6M
is a 6 month duration
We allow absolute times to be used (but don't expect them to be used).
Null string is treated as a truncated absolute time, as a special case.
The context for a truncated absolute time is the initial cycle time. Truncated absolute times are evaluated as in the dependencies section.
-
d[T06]
implies 6 a.m. immediately following the initial cycle time. -
d[]
(null string) implies the initial cycle time.
Relative times are distinguished by the use of durations. The context for a relative time is the cycle time of the task itself, not the initial cycle time.
We can then specify the relative time as a positive or negative time interval (duration), to be applied to the cycle time of the task itself.
The ISO 8601 format for durations should be supported.
We should allow the T
designator to be optional where the usage is unambiguous.
In this particular context, we should also support (and use as a standard) removing the leading P
in the duration format.
This means that d[-PT6H] => d
can be written as d[-6H] => d
.
There are some other options in cylc that are lengths of time and should be in the ISO 8601 duration format for completeness:
[cylc]
[[job submission]]
delay between batches = integer( min=0, default=0 ) # seconds
[[event handler submission]]
delay between batches = integer( min=0, default=0 ) # seconds
[[poll and kill command submission]]
delay between batches = integer( min=0, default=0 ) # seconds
[[event hooks]]
timeout = float( default=None )
[[accelerated clock]]
# Note: not rate, as this is seconds per hour and dimensionless.
offset = integer( default=24 ) # Hours
[[reference test]]
live mode suite timeout = float( default=None )
dummy mode suite timeout = float( default=None )
simulation mode suite timeout = float( default=None )
[scheduling]
runahead limit = integer( min=0, default=None )
[runtime]
[[__many__]]
retry delays = force_list( default=list() )
submission polling intervals = force_list( default=list() )
execution polling intervals = force_list( default=list() )
[[[simulation mode]]]
run time range = list( default=list(1,16))
[[[job submission]]]
retry delays = force_list( default=list() )
[[[event hooks]]]
submission timeout = float( default=None )
execution timeout = float( default=None )
[visualization]
initial cycle time = integer( default=None )
final cycle time = integer( default=None )
[[runtime graph]]
cutoff = integer( default=24 )
We may use some keywords to make some things more user-friendly, especially for the initial "R1" section.
The 2004 version of the ISO 8601 standard removes the official date/time truncated representation (using "-") which is very useful. This document was written with the 2000 version in mind - however, we could just repurpose the "-" as part of our special abbreviation syntax and call it 2004-based.
How does warm-starting a suite fit with initial and initial offset graph sections?
- The removal of dependence on previous cycles removes the need for cold-start "succeeded" tasks being inserted to get things going - a warm start will effectively keep the
initial cycle time
as is and start mid-way through.
How should altering the initial cycle time
on the command line in cylc run
affect the suite?
- We should distinguish "start time" from "initial cycle time" - see warm-start comment above.
How should altering the final cycle time
affect the suite if there are final graph sections?
- It must necessarily alter the final graph sections - however, setting a "STOP" time should not affect the graph.
[cylc]
UTC mode = True
[scheduling]
initial cycle time = 20130325
final cycle time = 20130404T12
[[dependencies]]
# Simple dependency section examples:
[[[ R1 ]]] # = R1/20130325/P??
graph = "a_initial" # Repeat once at initial cycle time
[[[ R1/+P6H ]]] # = R1/20130325+P6H/P?? = R1/20130325T06/P??
graph = "a0 => b0" # Repeat once, 6 hours after the initial cycle time
[[[ T06, T12 ]]] # = R/20130325T06/P1D, R/20130325T12/P1D
graph = "a1 => b1" # Repeat every day, starting at 06:00.
# Repeat every day, starting at 12:00.
[[[ R1/T06 ]]] # = R1/20130325T06/P1D
graph = "a2 => b2" # Repeat once, at the next 06:00.
[[[ 01T ]]] # = R1/20130401/P1M
graph = "a3 => b3" # Repeat every month, starting at the first day of the month.
[[[ P3H ]]] # = R1/20130325/P3H
graph = "a4 => b4" # Repeat every 3 hours, starting at the initial cycle time.
[[[ T00/P7H ]]] # = R/20130325T00/P7H
graph = "a5 => b5" # Repeat every 7 hours, starting at the next 00:00 am after
# (or including) the initial cycle time.
# Advanced-use dependency section examples:
[[[ R5//P6H ]]] # = R5/20130325/P6H
graph = "a6 => b6" # Repeat five times, every 6 hours, starting at the initial cycle time
[[[ 01T/P3Y4DT3M ]]] # = R/20130401/P3Y4DT3M
graph = "a7 => b7" # Repeat every 3 years, 4 days, and 3 minutes,
# starting at the next 1st day of the month
[[[ W115/P1W, W-3T06/P2W ]]] # = R/2013W11/P1W, R/20130327T06/P2W
graph = "a8 => b8" # Repeat every week, starting on Friday in the next 11th ordinal
# week of a year.
# Repeat every fortnight, starting next Wednesday at 6 a.m.
[[[ R1/20130401 ]]] # = R1/20130401/P?? (=20130401).
graph = "a9 => b9" # Repeat once at the date/time 20130401
[[[ R3//PT4M ]]] # = R3/20130325/PT4M
graph = "a10 => b10" # Repeat three times at an interval of 4 minutes, starting at the
# initial cycle time. This means run at:
# 20130325T0000
# 20130325T0004
# 20130325T0008
[[[ R5/P2H ]]] # = R5/P2H/20130404T12 (final cycle time)
graph = "a11 => b11" # Using the fourth ISO recurrence definition, repeat five times counting
# backwards from the final cycle time every 2 hours.
# This means run at:
# 20130404T04
# 20130404T06
# 20130404T08
# 20130404T10
# 20130404T12
[[[ R3/P1D/T06 ]]] # = R3/P1D/20130405T06
graph = "a12 => b12" # Using the fourth ISO recurrence definition, repeat three times counting
# backwards 1 day from 06:00 immediately following (or including) the
# final cycle time.
# This means (post final cycle times do not run) run at:
# 20130403T06
# 20130404T06
# This avoids hard-coding the cycle time.
[[[ R1/P1W ]]] # = R1/P???/20130404T12
graph = "a13 => b13" # Using the fourth ISO recurrence definition, repeat once at the
# final cycle time (period is meaningless for R1 and could be anything).
[[[ R1/+P1D ]]] # = R1/20130325+P1D/P??? = R1/20130326/P???
graph = "a14 => b14" # Repeat once, 1 day after the initial cycle time.
[[[ R1//-P1D ]]] # = R1/P???/20130404T12-P1D = R1/P???/20130403T12
graph = "a15 => b15" # Repeat once, 1 day before the final cycle time.
[[[ R/+P6H/P1D ]]] # = R/20130325+P6H/P1D = R/20130325T06/P1D
graph = "a16 => b16" # Repeat every day, starting at the initial cycle time plus 6 hours.
# Simple dependency graph examples:
[[[ T00 ]]]
graph = "a[-T6M] => b" # b depends on a from 6 minutes ago.
[[[ T01 ]]]
graph = "a[-6H] => b" # b depends on a from 6 hours ago.
[[[ T02 ]]]
graph = "a[-1Y] => b" # b depends on a from a year ago.
# Advanced-use dependency graph examples:
[[[ T03 ]]]
graph = "a[20130326T06] => b" # b depends on a at the absolute cycle time of `20130326T06`
[[[ T04 ]]]
graph = "a[] => b" # b depends on a at the initial cycle time (special case)