Choice of data

For the training of the neural network it is important to use a dataset that is representative of the situations the model would be used in. As a starting point, we need the model to perform well on production supercycles such as PHYS and PHYS+LHCFILL , with SFTPRO, LHCPILOT , AWAKE, HIRADMT, MD1, ZERO cycles, as well as SFTION.

Additionally it would be desirable if the model would generalize well when unseen cycles are in the supercycles, such as various MD cycles, both at high and at low energy.

The data is abundant in the SPS B-Train that logs measured I and B to NXCALS. However data must be chosen carefully to showcase the static and dynamic effects that are present in the machine, and to not provide too much of data that is too similar, to avoid overfitting on specific hysteresis loops.

Retrieval of data

The data is retrieved using the Hysteresis scripts Package, with the hysteresis_scripts.data.BTrainDataset convenience class.

Chosen data

We choose data from different time periods in the SPS, each comprising a dataframe with I_meas_A and B_meas_T columns, as well as the time stamp of recording for each sample as index, in UTC and nanoseconds. The dataframe additionally stores the corresponding LSA cycle and user of the measurements as well, and can be read into memory easily with hysteresis_scripts.data.BTrainDataset for visualizing data by cycle.

Training data

The following data is chosen as training data

FUNKY

One hour of data with AWAKE , HIRADMT, SFTPRO, LHCPILOT, MD1, SFTION, ZERO, with all cycles randomly going in and out of dynamic economy, with different combinations.

Length: 54m30s, 3 273 600 samples

start="2023-11-02 10:51:13"
end=  "2023-11-02 11:45:44"

PHYS+LHCFILL ZERO w/ precycle

  • 10x LHCPILOT
  • SFTPRO + 3x ZERO + SFTPRO + 3x ZERO played 5x times
  • SFTPRO + ZERO + LHCPILOT played 5 times
  • SFTPRO + 3x ZERO + SFTPRO + 3x ZERO played 5x times

This dataset is chosen as it is representative of a possible FUTURE supercycle for physics.

Length: 9m20s, 564 000 samples

start="2023-11-02 10:17:12"
end ="2023-11-02 10:26:17"

PHYS+LHCFILL MD1 w/ precycle

  • 10x LHCPILOT
  • SFTPRO + MD1 + SFTPRO + MD1 played 5x times
  • SFTPRO +

Length: 9m20s, 564 000 samples

start="2023-11-02 10:34:21"
end=  "2023-11-02 10:43:43"

Beam commissioning SFTPRO+MD+LHCPILOT

  • SFTPRO + LHCPILOT + MD

LHCPILOT starting right after SFTPRO, and an unknown MD cycle after LHCPILOT / before SFTPRO

Length: 4m38s, 278 000 samples

start="2024-03-12 14:28:47"
end  ="2024-03-12 14:33:30"

Beam commissioning SFTPRO+LHCPILOT+SCRUBBING

  • SFTPRO + LHCPILOT + MD cycle
  • SFTPRO + LHCPILOT + SCRUBBING + MD

Shows massive hysteresis loops after change of supercycle.

Length: 8m6s, 486 000 samples

start="2024-03-12 15:16:39"
end  ="2024-03-12 15:24:45"

Beam commissioning SFTPRO+LHCPILOT+MD+MD1

  • SFTPRO + LHCPILOT + MD + MD1

Long flat bottom MD cycle following LHCPILOT.

Length: 5m52s, 352 000 samples

start="2024-03-14 13:00:08"
end  ="2024-03-14 13:06:00"

Beam commissioning LHCPILOT+MD+MD1

  • LHCPILOT + MD + MD1

Long flat bottom MD cycle following LHCPILOT, but without SFTPRO

Length: 5m51s, 351 000 samples

start="2024-03-18 08:00:09"
end  ="2024-03-18 08:06:00"

Beam commissioning AWAKE

  • SFTPRO + LHCPILOT + AWAKE + MD1

Length: 6m, 600 000 samples

start="2024-03-25 16:36:46"
end  ="2024-03-25 16:42:46"

103m27s, 6 207 000 samples

Validation data

The following data is chosen as validation data.

Post funky PHYS+LHCFILL+PHYS_ZERO:

  • SFTPRO + 3x ZERO + SFTPRO + 3x ZERO played 5x times
  • SFTPRO + LHCPILOT + ZERO played 5 times
  • SFTPRO + 3x ZERO + SFTPRO + 3x ZERO played 5x times

This dataset is chosen since it is played after the hour of supercycle, it should be on a different hysteresis loop than one after a precycle.

This dataset is considered in source.

Length: 6m54s, 414000 samples

start="2023-11-02 11:45:46"
end  ="2023-11-02 11:52:40"  # last cycle has time stamp at 11:52:39

PHYS+DYNECO

  • SFTPRO + MD1 + LHC25NS + MD1 + SFTPRO + MD1 repeated
  • SFTPRO goes in and out of DYNECO in the beginning of the dataset

This dataset is representative of a FILL supercycle, also with cycles going in and out of dynamic economy.

This dataset is considered partially in source.

Length: 8m25s, 505 000 samples

start="2024-03-28 16:00:07",
end  ="2024-03-28 16:08:32"

Beam commissioning HIRADMAT

A somewhat unconventional supercycle with the HIRADMAT user. This dataset is selected to be a “crazy” dataset, that the model should strive to learn.

  • SFTPRO + LHC4 + MD1 + HIRADMT + MD1

This dataset is considered mainly out of source.

Length: 8m43s, 523 000 samples

start="2024-03-20 14:00:00"
end  ="2024-03-20 14:08:43"

Validation set contains 24m2s, 1 442 000 samples

Version 2

data:
  init_args:
    train_df_paths:
      - "~/cernbox/hysteresis/dipole/datasets/MD20240904/MD20240904_171430_180230_preprocessed.parquet"
      - "~/cernbox/hysteresis/dipole/datasets/MD20241009/SPS-BTRAIN-20241003-091402---20241003-091901_preprocessed.parquet"
      - "~/cernbox/hysteresis/dipole/datasets/MD20241022/SPS-BTRAIN-20241021-041601---20241021-042956_preprocessed.parquet"
      - "~/cernbox/hysteresis/dipole/datasets/MD20241119/SPS-BTRAIN-20241119-160407---20241119-161538_preprocessed.parquet"
      # - "~/cernbox/hysteresis/dipole/datasets/MD20241119/SPS-BTRAIN-20241105-064149---20241105-064525_preprocessed.parquet"
    val_df_paths:
      - "~/cernbox/hysteresis/dipole/datasets/val/SPS-BTRAIN-20231102-094557---20231102-095228_post_funky_preprocessed.parquet"

Is defined in ~/cernbox/hysteresis/dipole/datasets/train_V2.yml.

Changes:

  • 2024-11-20Validation set changed from ~/cernbox/hysteresis/dipole/datasets/train/SPS-BTRAIN-20231102-093421---20231102-094342_phys+lhcfill_md1_precycle_preprocessed.parquet ~/cernbox/hysteresis/dipole/datasets/val/SPS-BTRAIN-20231102-094557---20231102-095228_post_funky_preprocessed.parquet to remove the precycle from the validation set that is not seen in the training set. Training set replaced SFTION sequence with SFTION -> MD5 -> SFTION -> MD1 -> MD5 with MD1 switching from normal dipole sequence to idle current halfway in ~/cernbox/hysteresis/dipole/datasets/MD20241119/SPS-BTRAIN-20241119-160407---20241119-161538_preprocessed.parquet
  • Disabled ~/cernbox/hysteresis/dipole/datasets/MD20241009/SPS-BTRAIN-20240925-162433---20240925-163055_preprocessed.parquet and ~/cernbox/hysteresis/dipole/datasets/MD20241022/SPS-BTRAIN-20241021-041601---20241021-042956_preprocessed.parquet for duplicate data (SFTPRO1 x3 + LHCPILOT)

Consists of: