Choice of data
For the training of the neural network it is important to use a dataset that is representative of the situations the model would be used in. As a starting point, we need the model to perform well on production supercycles such as PHYS and PHYS+LHCFILL , with SFTPRO, LHCPILOT , AWAKE, HIRADMT, MD1, ZERO cycles, as well as SFTION.
Additionally it would be desirable if the model would generalize well when unseen cycles are in the supercycles, such as various MD cycles, both at high and at low energy.
The data is abundant in the SPS B-Train that logs measured I and B to NXCALS. However data must be chosen carefully to showcase the static and dynamic effects that are present in the machine, and to not provide too much of data that is too similar, to avoid overfitting on specific hysteresis loops.
Retrieval of data
The data is retrieved using the Hysteresis scripts Package, with the hysteresis_scripts.data.BTrainDataset convenience class.
Chosen data
We choose data from different time periods in the SPS, each comprising a dataframe with I_meas_A and B_meas_T columns, as well as the time stamp of recording for each sample as index, in UTC and nanoseconds. The dataframe additionally stores the corresponding LSA cycle and user of the measurements as well, and can be read into memory easily with hysteresis_scripts.data.BTrainDataset for visualizing data by cycle.
Training data
The following data is chosen as training data
FUNKY
One hour of data with AWAKE , HIRADMT, SFTPRO, LHCPILOT, MD1, SFTION, ZERO, with all cycles randomly going in and out of dynamic economy, with different combinations.
Length: 54m30s, 3 273 600 samples
start="2023-11-02 10:51:13"
end= "2023-11-02 11:45:44"PHYS+LHCFILL ZERO w/ precycle
- 10x LHCPILOT
- SFTPRO + 3x ZERO + SFTPRO + 3x ZERO played 5x times
- SFTPRO + ZERO + LHCPILOT played 5 times
- SFTPRO + 3x ZERO + SFTPRO + 3x ZERO played 5x times
This dataset is chosen as it is representative of a possible FUTURE supercycle for physics.
Length: 9m20s, 564 000 samples
start="2023-11-02 10:17:12"
end ="2023-11-02 10:26:17"PHYS+LHCFILL MD1 w/ precycle
- 10x LHCPILOT
- SFTPRO + MD1 + SFTPRO + MD1 played 5x times
- SFTPRO +
Length: 9m20s, 564 000 samples
start="2023-11-02 10:34:21"
end= "2023-11-02 10:43:43"Beam commissioning SFTPRO+MD+LHCPILOT
- SFTPRO + LHCPILOT + MD
LHCPILOT starting right after SFTPRO, and an unknown MD cycle after LHCPILOT / before SFTPRO
Length: 4m38s, 278 000 samples
start="2024-03-12 14:28:47"
end ="2024-03-12 14:33:30"Beam commissioning SFTPRO+LHCPILOT+SCRUBBING
- SFTPRO + LHCPILOT + MD cycle
- SFTPRO + LHCPILOT + SCRUBBING + MD
Shows massive hysteresis loops after change of supercycle.
Length: 8m6s, 486 000 samples
start="2024-03-12 15:16:39"
end ="2024-03-12 15:24:45"Beam commissioning SFTPRO+LHCPILOT+MD+MD1
- SFTPRO + LHCPILOT + MD + MD1
Long flat bottom MD cycle following LHCPILOT.
Length: 5m52s, 352 000 samples
start="2024-03-14 13:00:08"
end ="2024-03-14 13:06:00"Beam commissioning LHCPILOT+MD+MD1
- LHCPILOT + MD + MD1
Long flat bottom MD cycle following LHCPILOT, but without SFTPRO
Length: 5m51s, 351 000 samples
start="2024-03-18 08:00:09"
end ="2024-03-18 08:06:00"Beam commissioning AWAKE
- SFTPRO + LHCPILOT + AWAKE + MD1
Length: 6m, 600 000 samples
start="2024-03-25 16:36:46"
end ="2024-03-25 16:42:46"103m27s, 6 207 000 samples
Validation data
The following data is chosen as validation data.
Post funky PHYS+LHCFILL+PHYS_ZERO:
- SFTPRO + 3x ZERO + SFTPRO + 3x ZERO played 5x times
- SFTPRO + LHCPILOT + ZERO played 5 times
- SFTPRO + 3x ZERO + SFTPRO + 3x ZERO played 5x times
This dataset is chosen since it is played after the hour of supercycle, it should be on a different hysteresis loop than one after a precycle.
This dataset is considered in source.
Length: 6m54s, 414000 samples
start="2023-11-02 11:45:46"
end ="2023-11-02 11:52:40" # last cycle has time stamp at 11:52:39PHYS+DYNECO
- SFTPRO + MD1 + LHC25NS + MD1 + SFTPRO + MD1 repeated
- SFTPRO goes in and out of DYNECO in the beginning of the dataset
This dataset is representative of a FILL supercycle, also with cycles going in and out of dynamic economy.
This dataset is considered partially in source.
Length: 8m25s, 505 000 samples
start="2024-03-28 16:00:07",
end ="2024-03-28 16:08:32"Beam commissioning HIRADMAT
A somewhat unconventional supercycle with the HIRADMAT user. This dataset is selected to be a “crazy” dataset, that the model should strive to learn.
- SFTPRO + LHC4 + MD1 + HIRADMT + MD1
This dataset is considered mainly out of source.
Length: 8m43s, 523 000 samples
start="2024-03-20 14:00:00"
end ="2024-03-20 14:08:43"Validation set contains 24m2s, 1 442 000 samples
Version 2
data:
init_args:
train_df_paths:
- "~/cernbox/hysteresis/dipole/datasets/MD20240904/MD20240904_171430_180230_preprocessed.parquet"
- "~/cernbox/hysteresis/dipole/datasets/MD20241009/SPS-BTRAIN-20241003-091402---20241003-091901_preprocessed.parquet"
- "~/cernbox/hysteresis/dipole/datasets/MD20241022/SPS-BTRAIN-20241021-041601---20241021-042956_preprocessed.parquet"
- "~/cernbox/hysteresis/dipole/datasets/MD20241119/SPS-BTRAIN-20241119-160407---20241119-161538_preprocessed.parquet"
# - "~/cernbox/hysteresis/dipole/datasets/MD20241119/SPS-BTRAIN-20241105-064149---20241105-064525_preprocessed.parquet"
val_df_paths:
- "~/cernbox/hysteresis/dipole/datasets/val/SPS-BTRAIN-20231102-094557---20231102-095228_post_funky_preprocessed.parquet"
Is defined in ~/cernbox/hysteresis/dipole/datasets/train_V2.yml.
Changes:
- 2024-11-20Validation set changed from
~/cernbox/hysteresis/dipole/datasets/train/SPS-BTRAIN-20231102-093421---20231102-094342_phys+lhcfill_md1_precycle_preprocessed.parquet→~/cernbox/hysteresis/dipole/datasets/val/SPS-BTRAIN-20231102-094557---20231102-095228_post_funky_preprocessed.parquetto remove the precycle from the validation set that is not seen in the training set. Training set replaced SFTION sequence withSFTION -> MD5 -> SFTION -> MD1 -> MD5with MD1 switching from normal dipole sequence to idle current halfway in~/cernbox/hysteresis/dipole/datasets/MD20241119/SPS-BTRAIN-20241119-160407---20241119-161538_preprocessed.parquet - Disabled
~/cernbox/hysteresis/dipole/datasets/MD20241009/SPS-BTRAIN-20240925-162433---20240925-163055_preprocessed.parquetand~/cernbox/hysteresis/dipole/datasets/MD20241022/SPS-BTRAIN-20241021-041601---20241021-042956_preprocessed.parquetfor duplicate data (SFTPRO1 x3 + LHCPILOT)
Consists of:
- Dataset YETS post funky (validation)