Skip to content

Commit e003e9a

Browse files
authored
Merge pull request #1286 from rstudio/timeseries-dataset-from-array
add `timeseries_dataset_from_array()`
2 parents a3477f6 + b11d768 commit e003e9a

File tree

10 files changed

+483
-76
lines changed

10 files changed

+483
-76
lines changed

.github/workflows/R-CMD-check.yaml

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ jobs:
3939
# - {os: 'windows-latest', tf: 'release', r: 'release'}
4040
# - {os: 'macOS-latest' , tf: 'release', r: 'release'}
4141

42+
- {os: 'ubuntu-20.04', tf: '2.7', r: 'release'}
43+
- {os: 'ubuntu-20.04', tf: '2.6', r: 'release'}
4244
- {os: 'ubuntu-20.04', tf: '2.5', r: 'release'}
4345
- {os: 'ubuntu-20.04', tf: '2.4', r: 'release'}
4446
- {os: 'ubuntu-20.04', tf: '2.3', r: 'release'}
@@ -47,8 +49,8 @@ jobs:
4749

4850
# these are allowed to fail
4951
# - {os: 'ubuntu-20.04', tf: 'default', r: 'devel'}
50-
- {os: 'ubuntu-20.04', tf: '2.7.0rc1', r: 'release'}
51-
# - {os: 'ubuntu-20.04', tf: 'nightly' , r: 'release'}
52+
# - {os: 'ubuntu-20.04', tf: '2.7.0rc1', r: 'release'}
53+
- {os: 'ubuntu-20.04', tf: 'nightly' , r: 'release'}
5254

5355
runs-on: ${{ matrix.os }}
5456
continue-on-error: ${{ matrix.tf == 'nightly' || contains(matrix.tf, 'rc') || matrix.r == 'devel' }}
@@ -88,7 +90,7 @@ jobs:
8890
id: r-package-cache
8991
with:
9092
path: ${{ env.R_LIBS_USER }}
91-
key: ${{ matrix.os }}-${{ steps.setup-r.outputs.installed-r-version }}-${{ steps.get-date.outputs.year-week }}
93+
key: ${{ matrix.os }}-${{ steps.setup-r.outputs.installed-r-version }}-${{ steps.get-date.outputs.year-week }}-1
9294

9395
- name: Install remotes
9496
if: steps.r-package-cache.outputs.cache-hit != 'true'
@@ -105,9 +107,6 @@ jobs:
105107
sudo $cmd
106108
done < <(Rscript -e "writeLines(remotes::system_requirements('$ID-$VERSION_ID'))")
107109
108-
- name: Use dev reticulate
109-
run: remotes::install_github("t-kalinowski/reticulate")
110-
111110
- name: Install Package + deps
112111
run: remotes::install_local(dependencies = TRUE, force = TRUE)
113112

NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -588,6 +588,7 @@ export(texts_to_matrix)
588588
export(texts_to_sequences)
589589
export(texts_to_sequences_generator)
590590
export(time_distributed)
591+
export(timeseries_dataset_from_array)
591592
export(timeseries_generator)
592593
export(to_categorical)
593594
export(train_on_batch)

NEWS.md

Lines changed: 20 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# keras (development version)
22

3+
- Default Tensorflow + Keras version is now 2.7.
4+
35
- New API for constructing RNN (Recurrent Neural Network) layers. This is a
46
flexible interface that complements the existing RNN layers. It is primarily
57
intended for advanced / research applications, e.g, prototyping novel
@@ -14,32 +16,35 @@
1416
To learn more, including how to make a custom cell layer, see the new vignette:
1517
"Working with RNNs".
1618

17-
- New dataset loader `text_dataset_from_directory()`.
19+
- New dataset functions:
20+
- `text_dataset_from_directory()`
21+
- `timeseries_dataset_from_array()`
1822

1923
- New layers:
20-
- `layer_additive_attention()`
21-
- `layer_conv_lstm_1d()`
22-
- `layer_conv_lstm_3d()`
24+
- `layer_additive_attention()`
25+
- `layer_conv_lstm_1d()`
26+
- `layer_conv_lstm_3d()`
2327

2428
- `layer_cudnn_gru()` and `layer_cudnn_lstm()` are deprecated.
2529
`layer_gru()` and `layer_lstm()` will automatically use CuDNN if it is available.
2630

2731
- `layer_lstm()` and `layer_gru()`:
28-
default value for `recurrent_activation` changed
29-
from `"hard_sigmoid"` to `"sigmoid"`.
32+
default value for `recurrent_activation` changed
33+
from `"hard_sigmoid"` to `"sigmoid"`.
3034

3135
- `layer_gru()`: default value `reset_after` changed from `FALSE` to `TRUE`
3236

3337
- New vignette: "Transfer learning and fine-tuning".
3438

3539
- New applications:
36-
- MobileNet V3: `application_mobilenet_v3_large()`, `application_mobilenet_v3_small()`
37-
- ResNet: `application_resnet101()`, `application_resnet152()`, `resnet_preprocess_input()`
38-
- ResNet V2:`application_resnet50_v2()`, `application_resnet101_v2()`,
39-
`application_resnet152_v2()` and `resnet_v2_preprocess_input()`
40-
- EfficientNet: `application_efficientnet_b{0,1,2,3,4,5,6,7}()`
41-
42-
- Many existing `application_*()`'s gain argument `classifier_activation`, with default `'softmax'`.
40+
- MobileNet V3: `application_mobilenet_v3_large()`, `application_mobilenet_v3_small()`
41+
- ResNet: `application_resnet101()`, `application_resnet152()`, `resnet_preprocess_input()`
42+
- ResNet V2:`application_resnet50_v2()`, `application_resnet101_v2()`,
43+
`application_resnet152_v2()` and `resnet_v2_preprocess_input()`
44+
- EfficientNet: `application_efficientnet_b{0,1,2,3,4,5,6,7}()`
45+
46+
- Many existing `application_*()`'s gain argument `classifier_activation`,
47+
with default `'softmax'`.
4348
Affected: `application_{xception, inception_resnet_v2, inception_v3, mobilenet, vgg16, vgg19}()`
4449

4550
- New function `%<-active%`, a ergonomic wrapper around `makeActiveBinding()`
@@ -70,6 +75,8 @@
7075

7176
- `k_random_uniform()` now automatically casts `minval` and `maxval` to the output dtype.
7277

78+
- `install_keras()` gains arg with default `pip_ignore_installed = TRUE`.
79+
7380
# keras 2.6.1
7481

7582
- New family of *preprocessing* layers. These are the spiritual successor to the `tfdatasets::step_*` family of data transformers (to be deprecated in a future release).

R/install.R

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
#' thin wrapper around [`tensorflow::install_tensorflow()`], with the only
55
#' difference being that this includes by default additional extra packages that
66
#' keras expects, and the default version of tensorflow installed by
7-
#' `install_keras()` may at times be different from the default installed
7+
#' `install_keras()` may at times be different from the default installed
88
#' `install_tensorflow()`. The default version of tensorflow installed by
99
#' `install_keras()` is "`r default_version`".
1010
#'
@@ -13,36 +13,41 @@
1313
#' versions potentially constrained for compatibility with the
1414
#' requested tensorflow version.
1515
#'
16-
#' @inherit tensorflow::install_tensorflow
16+
#' @inheritParams tensorflow::install_tensorflow
1717
#'
1818
#' @param tensorflow Synonym for `version`. Maintained for backwards.
1919
#'
20-
#' @seealso [tensorflow::install_tensorflow()]
20+
#' @seealso [`tensorflow::install_tensorflow()`]
2121
#' @export
2222
install_keras <- function(method = c("auto", "virtualenv", "conda"),
2323
conda = "auto",
2424
version = "default",
2525
tensorflow = version,
2626
extra_packages = NULL,
27-
...) {
27+
...,
28+
pip_ignore_installed = TRUE) {
2829

2930
pkgs <- default_extra_packages(tensorflow)
3031
if(!is.null(extra_packages)) # user supplied package version constraints take precedence
3132
pkgs[gsub("[=<>~]{1,2}[0-9.]+$", "", extra_packages)] <- extra_packages
3233

33-
if(tensorflow == "default") # may be different from tensorflow
34-
tensorflow <- default_version
34+
if(tensorflow %in% c("cpu", "gpu"))
35+
tensorflow <- paste0("default-", tensorflow)
36+
37+
if(grepl("^default", tensorflow))
38+
tensorflow <- sub("^default", as.character(default_version), tensorflow)
3539

3640
tensorflow::install_tensorflow(
3741
method = match.arg(method),
3842
conda = conda,
3943
version = tensorflow,
4044
extra_packages = pkgs,
45+
pip_ignore_installed = pip_ignore_installed,
4146
...
4247
)
4348
}
4449

45-
default_version <- numeric_version("2.6")
50+
default_version <- numeric_version("2.7")
4651

4752
default_extra_packages <- function(tensorflow_version) {
4853
pkgs <- c("tensorflow-hub", "scipy", "requests", "pyyaml", "Pillow", "h5py", "pandas")
@@ -99,3 +104,9 @@ default_extra_packages <- function(tensorflow_version) {
99104
pkgs
100105
}
101106

107+
108+
# @inheritSection tensorflow::install_tensorflow "Custom Installation" "Apple Silicon" "Additional Packages"
109+
# @inherit tensorflow::install_tensorflow details
110+
# @inherit tensorflow::install_tensorflow params return references description details sections
111+
# ## everything except 'seealso' to avoid this warning
112+
# ## Warning: Link to unknown topic in inherited text: keras::install_keras

R/preprocessing.R

Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1147,3 +1147,167 @@ function(directory,
11471147
seed = as_nullable_integer))
11481148
do.call(keras$preprocessing$text_dataset_from_directory, args)
11491149
}
1150+
1151+
1152+
#' Creates a dataset of sliding windows over a timeseries provided as array
1153+
#'
1154+
#' @details
1155+
#' This function takes in a sequence of data-points gathered at
1156+
#' equal intervals, along with time series parameters such as
1157+
#' length of the sequences/windows, spacing between two sequence/windows, etc.,
1158+
#' to produce batches of timeseries inputs and targets.
1159+
#'
1160+
#' @section Example 1:
1161+
#'
1162+
#' Consider indices `0:99`. With `sequence_length=10`, `sampling_rate=2`,
1163+
#' `sequence_stride=3`, `shuffle=FALSE`, the dataset will yield batches of
1164+
#' sequences composed of the following indices:
1165+
#'
1166+
#' ```
1167+
#' First sequence: 0 2 4 6 8 10 12 14 16 18
1168+
#' Second sequence: 3 5 7 9 11 13 15 17 19 21
1169+
#' Third sequence: 6 8 10 12 14 16 18 20 22 24
1170+
#' ...
1171+
#' Last sequence: 78 80 82 84 86 88 90 92 94 96
1172+
#' ```
1173+
#'
1174+
#' In this case the last 3 data points are discarded since no full sequence
1175+
#' can be generated to include them (the next sequence would have started
1176+
#' at index 81, and thus its last step would have gone over 99).
1177+
#'
1178+
#' @section Example 2: Temporal regression.
1179+
#'
1180+
#' Consider an array `data` of scalar values, of shape `(steps)`.
1181+
#' To generate a dataset that uses the past 10
1182+
#' timesteps to predict the next timestep, you would use:
1183+
#'
1184+
#' ``` R
1185+
#' steps <- 100
1186+
#' # data is integer seq with some noise
1187+
#' data <- array(1:steps + abs(rnorm(steps, sd = .25)))
1188+
#' inputs_data <- head(data, -10) # drop last 10
1189+
#' targets <- tail(data, -10) # drop first 10
1190+
#' dataset <- timeseries_dataset_from_array(
1191+
#' inputs_data, targets, sequence_length=10)
1192+
#' library(tfdatasets)
1193+
#' dataset_iterator <- as_iterator(dataset)
1194+
#' repeat {
1195+
#' batch <- iter_next(dataset_iterator)
1196+
#' if(is.null(batch)) break
1197+
#' c(input, target) %<-% batch
1198+
#' stopifnot(exprs = {
1199+
#' # First sequence: steps [1-10]
1200+
#' # Corresponding target: step 11
1201+
#' all.equal(as.array(input[1, ]), data[1:10])
1202+
#' all.equal(as.array(target[1]), data[11])
1203+
#'
1204+
#' all.equal(as.array(input[2, ]), data[2:11])
1205+
#' all.equal(as.array(target[2]), data[12])
1206+
#'
1207+
#' all.equal(as.array(input[3, ]), data[3:12])
1208+
#' all.equal(as.array(target[3]), data[13])
1209+
#' })
1210+
#' }
1211+
#' ```
1212+
#'
1213+
#' @section Example 3: Temporal regression for many-to-many architectures.
1214+
#'
1215+
#' Consider two arrays of scalar values `X` and `Y`,
1216+
#' both of shape `(100)`. The resulting dataset should consist of samples with
1217+
#' 20 timestamps each. The samples should not overlap.
1218+
#' To generate a dataset that uses the current timestamp
1219+
#' to predict the corresponding target timestep, you would use:
1220+
#'
1221+
#' ``` R
1222+
#' X <- seq(100)
1223+
#' Y <- X*2
1224+
#'
1225+
#' sample_length <- 20
1226+
#' input_dataset <- timeseries_dataset_from_array(
1227+
#' X, NULL, sequence_length=sample_length, sequence_stride=sample_length)
1228+
#' target_dataset <- timeseries_dataset_from_array(
1229+
#' Y, NULL, sequence_length=sample_length, sequence_stride=sample_length)
1230+
#'
1231+
#' library(tfdatasets)
1232+
#' dataset_iterator <-
1233+
#' zip_datasets(input_dataset, target_dataset) %>%
1234+
#' as_array_iterator()
1235+
#' while(!is.null(batch <- iter_next(dataset_iterator))) {
1236+
#' c(inputs, targets) %<-% batch
1237+
#' stopifnot(
1238+
#' all.equal(inputs[1,], X[1:sample_length]),
1239+
#' all.equal(targets[1,], Y[1:sample_length]),
1240+
#' # second sample equals output timestamps 20-40
1241+
#' all.equal(inputs[2,], X[(1:sample_length) + sample_length]),
1242+
#' all.equal(targets[2,], Y[(1:sample_length) + sample_length])
1243+
#' )
1244+
#' }
1245+
#' ```
1246+
#'
1247+
#' @param data array or eager tensor
1248+
#' containing consecutive data points (timesteps).
1249+
#' The first axis is expected to be the time dimension.
1250+
#'
1251+
#' @param targets Targets corresponding to timesteps in `data`.
1252+
#' `targets[i]` should be the target
1253+
#' corresponding to the window that starts at index `i`
1254+
#' (see example 2 below).
1255+
#' Pass NULL if you don't have target data (in this case the dataset will
1256+
#' only yield the input data).
1257+
#'
1258+
#' @param sequence_length Length of the output sequences (in number of timesteps).
1259+
#'
1260+
#' @param sequence_stride Period between successive output sequences.
1261+
#' For stride `s`, output samples would
1262+
#' start at index `data[i]`, `data[i + s]`, `data[i + (2 * s)]`, etc.
1263+
#'
1264+
#' @param sampling_rate Period between successive individual timesteps
1265+
#' within sequences. For rate `r`, timesteps
1266+
#' `data[i], data[i + r], ... data[i + sequence_length]`
1267+
#' are used for create a sample sequence.
1268+
#'
1269+
#' @param batch_size Number of timeseries samples in each batch
1270+
#' (except maybe the last one).
1271+
#'
1272+
#' @param shuffle Whether to shuffle output samples,
1273+
#' or instead draw them in chronological order.
1274+
#'
1275+
#' @param seed Optional int; random seed for shuffling.
1276+
#'
1277+
#' @param start_index Optional int; data points earlier (exclusive)
1278+
#' than `start_index` will not be used
1279+
#' in the output sequences. This is useful to reserve part of the
1280+
#' data for test or validation.
1281+
#'
1282+
#' @param end_index Optional int; data points later (exclusive) than `end_index`
1283+
#' will not be used in the output sequences.
1284+
#' This is useful to reserve part of the data for test or validation.
1285+
#'
1286+
#' @param ... For backwards and forwards compatibility, ignored presently.
1287+
#'
1288+
#' @seealso
1289+
#' + <https://www.tensorflow.org/api_docs/python/tf/keras/utils/timeseries_dataset_from_array>
1290+
#'
1291+
#' @returns A `tf.data.Dataset` instance. If `targets` was passed, the
1292+
#' dataset yields batches of two items: `(batch_of_sequences,
1293+
#' batch_of_targets)`. If not, the dataset yields only
1294+
#' `batch_of_sequences`.
1295+
#'
1296+
#' @export
1297+
timeseries_dataset_from_array <-
1298+
function(data, targets, sequence_length, sequence_stride = 1L,
1299+
sampling_rate = 1L, batch_size = 128L, shuffle = FALSE, ...,
1300+
seed = NULL, start_index = NULL, end_index = NULL)
1301+
{
1302+
require_tf_version("2.6", "timeseries_dataset_from_array")
1303+
args <- capture_args(match.call(), list(
1304+
sequence_length = as.integer,
1305+
sequence_stride = as.integer,
1306+
sampling_rate = as.integer,
1307+
batch_size = as.integer,
1308+
seed = as_nullable_integer,
1309+
start_index = as_nullable_integer,
1310+
end_index = as_nullable_integer
1311+
))
1312+
do.call(keras$preprocessing$timeseries_dataset_from_array, args)
1313+
}

0 commit comments

Comments
 (0)