Understanding fine-grained urban population distribution based on GPS location data is important for urban applications such as traffic management and new store openings for retailers. However, GPS-based population distribution relies heavily on the number of users who agree to provide GPS logs. With only a limited number of users, the fine-grained population distribution becomes sparse and must be aggregated as coarse-grained. In this paper, we present the challenge of developing a model to estimate fine-grained population distributions from coarse-grained population distributions and propose a model capable of incorporating extensive auxiliary information using a CNN-based image super-resolution approach. Our experiments with real data reveal two key findings: (i) traditional regression models tend to estimate similar populations for adjacent grids, which is often overlooked by existing metrics, and (ii) CNN-based image super-resolution models reproduce population distribution features of adjacent grids having different population volumes, although they sometimes provide simplistic estimates depending on auxiliary information. Based on these findings, we present our vision for developing a promising model and improving the evaluation metrics tailored to this challenge.