Choice of optimizer

Adam, AdamW, Ranger, and Lion were used to train a TFT on 24h of simulated data with absolute time features, on 4 GPUs.

Based on this experiment, there seems to be no need to use the Ranger optimizer, nor Lion, and rather to stick to Adam or AdamW.

Hysteresis Compensation