Parameter constraints
[0, 1]
You’re right in noticing that this can become unstable: the root problem is when raw_parameter drifts too far, especially when it becomes strongly negative, the forward mapping (tanh(raw_parameter) + 1)/2 shrinks exponentiallytowards 0. Then during backprop, gradients through tanh become vanishingly small (since the derivative of tanh is close to 0 for large negative inputs), and optimization slows down massively or gets stuck.
Main issues:
-
tanhsaturates very strongly for large positive/negativeraw_parameter. -
Even worse, gradient updates on
raw_parameterare unconstrained, so if a few unlucky updates happen, it can “run off” far into negative values. -
When
raw_parameteris very negative, the corresponding output is ~0, but it is incredibly sensitive to perturbations (unstable gradients).
Background

The fit with the original differentiable hysteresis model is OK, but does not manage to deactivate hysterons along SFT flat top when we start to decrease the current. (Many hysterons would deactivate quickly).
We see that we manage to fit the data back to range, as extracted by the Calibration function.