This work explores super resolution (SR) with a deep network based on a pixel probabilistic model, where particular small inputs and large magnification factors make the problem highly underspecified since fairly large amounts of high-frequency details are missing in low resolution (LR) source. In this paper, we develop a deep architecture comprising of a PixelCNN and a residual network (ResNet), in which PixelCNN predicts the serial dependencies of the pixel sequence and ResNet for capturing the global structure of LR input. A human visual saliency mechanism (HVSM) by employing accurate SR in salient regions and fast interpolation in nonsalient regions is integrated within the pixel probabilistic model to efficiently reduce the computational complexity while maintaining the desired visual quality. Additionally, we present a Bayesian optimization technique to automatically determine the optimal weight of loss function. Furthermore, a modified image quality assessment taking into account HVSM is introduced, trying to align with the human visual perception. Experiments demonstrate that the proposed algorithm could generate more plausible facial features than previous deep learning methods, offering finer details and significant improvement in visual quality.