Lookup-table (LUT) mapping is an indispensable step in FPGA design flows, and also serves as a building block in many technology-independent optimization algorithms. Therefore, it is crucial to accelerate LUT mapping in order to satisfy the demand for synthesizing high-quality, large-scale VLSI designs. Previous work on GPU LUT mapping suffers from low speedup due to limited degree of parallelism. In this paper, we propose an ultra-fast GPU-parallel LUT mapping engine named FineMap, which is composed of a novel fine-grained mapping phase with a high degree of parallelism, a parallel cut expansion phase and a parallel timing analysis pass. The mapping phase is enhanced by specifically tailored cut evaluation and memory management algorithms for GPUs that enable fast mapping of large circuits with limited GPU memory. Experiments show that compared with the high-performance mapper implemented in ABC, FineMap achieves 128.7× speedup with better quality in terms of area on large benchmarks.