2 years agoALiBi - Train Short, Test Long: Attention with linear biases enables input length extrapolationykilcher