Try TransformerEncoder with flattened inputs [B_old, I_new] → [B_old, B_new]