Irregular time indices in neural networks

LSTMs and other types of sequence-to-sequence models are black-box models without an explicit time axis, as opposed to Neural ODEs. There are a few ways to pass the irregular time information to the neural network. This is especially useful when using Adaptive downsampling.

Implement irregular time indices in transformertf [priority:: high] [completion:: 2024-08-06]

This needs to be implemented natively in transformertf for distributed training.

1. Positional encodings

Transformer-style architectures rely on positional encodings to inform the neural network of the order of the elements in a sequence, often using cos+sin transformations on $[0, 1, 2, \dots, N]$ . Using non-sequential integers to pass to the cos+sin transform it is possible to to convey this information to transformers.

2. Relative time indices

Using the full available $t$ time axis, we can compute $Δ t$ , i.e. the time between each timestamp. This can be passed as an additional feature/covariate to the seq2seq model.

Important

The $Δ t$ feature also has to be normalized. Care has to be taken with if there are outliers in $Δ t$ , in which case a log1p transform might be appropriate prior to normalization, to avoid washing out fine-grained time features.

3. Absolute time indices

Using the time axis $t$ , for each sample the time axis will be $t$ , and subtract the whole time axis by $t_{0}$ , to make each sample time axis start with $0$ . Finally the time axis needs to be divided by a normalizer to have all time values in $[0, 1]$ .

With random sequence length enabled for the encoder, the we can see a spread of time evolutions. Dotted is the decoder input (i.e. prediction axis).

The normalizer can be found by iterating over the entire dataset with a window generator

Hysteresis Compensation

Explorer

Irregular time indices in neural networks

1. Positional encodings

2. Relative time indices

3. Absolute time indices

Graph View

Table of Contents

Backlinks