In this work we present a clustered local time stepping (LTS) scheme for the arbitrary high-order derivatives discontinuous Galerkin finite element scheme. By clustering elements of similar time step, our scheme meets regularity requirements of modern hardware through the design of the numerical discretization. We present a detailed description of our clustered local time stepping scheme for the seismic simulation package SeisSol. Our scheme is able to capture homogeneous and heterogeneous time step variations in the computational domain and maintains a large fraction of the theoretical speedup offered by LTS. From an engineering standpoint, our scheme addresses all important performance characteristics of state-of-the-art supercomputers. The combined algorithmic and computational performance results for SeisSol show that we are able to leverage the large potential of local time stepping by reducing time-to-solution by several factors (2.3 -- 4.1), sustaining more than 53% of SuperMUC-II's HPL performance, what corresponds to more than 1.5 PFLOPS performance on 86,016 cores.