Trained on dipole data, funky combination and short phys+lhcfill+phys_w_precycle datasets.
Question
We want to find appropriate model architecture hyperparameters for hysteresis data
Experiment
We test the model over 100 epochs with
-
n_dim_model: 64, 128, 256 -
num_heads: 1, 2, 4, 8 -
num_encoder_layers: 1, 2, 6, 12 -
num_decoder_layers: 1, 2, 6, 12
With all 192 combinations.
Results
Add results of study here, including hard data, to inform future decisions.
The results are inconclusive, as the model does not fit the data very well after 24 epochs, and does crash in the middle of the grid search due to lack of storage space after a while.