This work presents and studies the efficiency problem of mapping GPU threads onto simplex domains. A non-linear map $\lambda (\omega)$ is formulated based on a block-space enumeration principle that reduces the number of thread-blocks by a factor of approximately $2\times$ and $6\times$ for 2-simplex and 3-simplex domains, respectively, when compared to the standard approach. Performance results show that $\lambda (\omega)$ is competitive and even the fastest map when ran in recent GPU architectures such as the Tesla V100, where it reaches up to $1.5\times$ of speedup in 2-simplex tests. In 3-simplex tests, it reaches up to $2.3\times$ of speedup for small workloads and up to $1.25\times$ for larger ones. The results obtained make $\lambda (\omega)$ a useful GPU optimization technique with applications on parallel problems that define all-pairs, all-triplets or nearest neighbors interactions in a 2-simplex or 3-simplex domain.