Notes

Big trouble finetuning PRETFTMBI-124 related to Torch errors on compile dynamic shapes, but even when patched seems to have issues with torch compilation and random sequence length.

I switch off randomize_seq_len to train TRATFTMBI-53 on ml4, and keep it on in TRATFTMBI-50 on ml3, since we desperately need a model for the Dedicated MD 2025-07-23.

We further finetune TRATFTMBI-53 to TFTMBI-174 with ctxt_seq_len fixed to 540 (SFTPRO1 length)

Maurus’ presentation

Missing slide numbers Feedback control does not go through LSA