TFLSTM-40

┏━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃   ┃ Name                     ┃ Type                 ┃ Params ┃
┡━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ 0 │ criterion                │ WeightedMSELoss      │      0 │
│ 1 │ model                    │ TransformerLSTMModel │  4.3 M │
│ 2 │ model.encoder_grn        │ GatedResidualNetwork │  199 K │
│ 3 │ model.decoder_grn        │ GatedResidualNetwork │  199 K │
│ 4 │ model.encoder            │ LSTM                 │  1.1 M │
│ 5 │ model.decoder            │ LSTM                 │  1.1 M │
│ 6 │ model.transformer_blocks │ ModuleList           │  1.8 M │
│ 7 │ model.output_head        │ Linear               │    257 │
└───┴──────────────────────────┴──────────────────────┴────────┘
Trainable params: 4.3 M
Non-trainable params: 0
Total params: 4.3 M
Total estimated model params size (MB): 17

Incorrectly trained TransformerLSTM with encoder_alignment="left", but transformertf.utils.sequences.align_encoder_sequences aligning samples to the right as default call. This caused incorrect samples being packed by torch, and masking being applied incorrectly when calculating the MSE loss. In the worse case with long masked sequences, the model was basically learning zeroes.