Notes
Big trouble finetuning PRETFTMBI-124 related to Torch errors on compile dynamic shapes, but even when patched seems to have issues with torch compilation and random sequence length.
I switch off randomize_seq_len to train TRATFTMBI-53 on ml4, and keep it on in TRATFTMBI-50 on ml3, since we desperately need a model for the Dedicated MD 2025-07-23.
We further finetune TRATFTMBI-53 to TFTMBI-174 with ctxt_seq_len fixed to 540 (SFTPRO1 length)
Maurus’ presentation
Missing slide numbers Feedback control does not go through LSA