Trained on dipole data, funky combination and short phys+lhcfill+phys_w_precycle datasets.

Question

We want to find appropriate model architecture hyperparameters for hysteresis data

Experiment

We test the model over 100 epochs with

  • n_dim_model: 64, 128, 256

  • num_heads: 1, 2, 4, 8

  • num_encoder_layers: 1, 2, 6, 12

  • num_decoder_layers: 1, 2, 6, 12

With all 192 combinations.

Results

Add results of study here, including hard data, to inform future decisions.

The results are inconclusive, as the model does not fit the data very well after 24 epochs, and does crash in the middle of the grid search due to lack of storage space after a while.