Ultrafast nonlinear dynamics in fibre optics can be simulated using the nonlinear Schrödinger equation (NLSE). However, the process may be slow if we need to do many simulations, such as during the optimization of optical device. It has been shown that recurrent neural network (RNN), a branch of deep learning model, has promising result in predicting the nonlinear dynamics of a laser pulse [1], [2]. We propose the use of transformer-based [3] deep learning model, to achieve superior accuracy and speed by leveraging its long-term temporal dependency and parallel computation brought by the attention mechanism.