Copy from Dipole dataset v3.

Training set

Additionally we add a dataset from MD20230628 to the training set (MD20230628_SFT_MD___LHCPILOT_SFT_MD.parquet)

We additionally add the following datasets from MD20241009

and from Dedicated MD 2024-09-25

Validation set

MD 2024-09-04 MD1 split. The flattened MD1 are used for training.

Configuration

The configuration is generated with

python ~/cernbox/hysteresis/dipole/datasets/v3/dataset_to_config.py ~/cernbox/hysteresis/dipole/datasets/v3/datasets  datasets extras --val-dataset validation/MD20240904_val_preprocessed.parquet -c mbi_dataset_v4.yml

and all datasets are pre-processed with since heavy lowpass filtering introduces artifacts.

Training

The first TFT trained with the v4 dataset is found in TFTMBI-44.