Accurate modeling of elastic wavefields in 3-D anisotropic media is important for many seismic processing and inversion applications. However, efficient wavefield simulation for tilted transversely isotropic (TTI) media and, especially, for orthorhombic and lower symmetries remains challenging. Finite-difference (FD) implementations using centered Taylor-series coefficients on singly staggered grids suffer from reduced numerical accuracy due to problems in computing the partial wavefield derivatives in TTI or tilted orthorhombic (TOR) media, as well as in enforcing the free-surface (zero-traction) boundary conditions. To address these issues, we develop a 3-D mimetic FD (MFD) algorithm for arbitrarily anisotropic media that uses a fully-staggered-grid strategy. This CUDA-based algorithm is implemented on graphics processing units (GPUs) to leverage the massive parallelism of this computer architecture. For multi-GPU parallelization, we employ the CUDA-aware message passing interface (MPI) library to exploit the remote direct memory access (RDMA) feature for buffer transfers. Weak- and strong-scaling tests on up to eight DGX NVIDIA A100 nodes (64 GPUs in total) demonstrate that the proposed multi-GPU implementation achieves a quasi-linear computational speedup with over 98% efficiency for large industrial-scale models of size in excess of 1.7 × 10 10 grid points. [ABSTRACT FROM AUTHOR]