Adam, AdamW, Ranger, and Lion were used to train a TFT on 24h of simulated data with absolute time features, on 4 GPUs.

OptimizerLRWeight decay
Adam5e-4
AdamW5e-41e-2
Ranger5e-4
Lion8e-51e-2

Based on this experiment, there seems to be no need to use the Ranger optimizer, nor Lion, and rather to stick to Adam or AdamW.