Finding relationships between two related groups of images has been a challenging question in many fields. E.g., it is informative in neuron science studies to build the connection between experiment animals' neuron activity images and their behavior images. Very few previous works have achieved this task since most generative models focus on reconstructing the output images similar to input images. We proposed a novel framework in this paper to accomplish this goal, which could map images from one group to images from another group. We apply the singular value decomposition (SVD) method to remove the original images' background noise. Next, we combine two deep learning approaches, variational autoencoder (VAE) and convolutional neural networks (CNN), to directly connect two groups of images. We test our framework on images from a neuron science experiment. Results show that the proposed framework could generate mice paw movement images given the mice neuron images, which are very close to the ground truth images. In terms of capturing the paw gestures in paw movement images, experiment results demonstrate that our framework outperforms the state-of-art paw location detection method.