Wireless capsule endoscopy (WCE) is a modality used for the non-invasive examination of the gastrointestinal (GI) tract. Physicians diagnose pathologies in images derived from Capsule Endoscopy (CE) using specific gaze patterns to observe pathologically related visual cues. Lately, deep learning has advanced in the domain of human eye-fixation estimation in natural images. However, the potentials of predicting the eye related patterns, such as eye fixations, in medical images has not been thoroughly investigated. In this work, we propose a CNN auto-encoder model, that is capable of predicting saliency maps estimating the gaze-patterns, in terms of eye-fixations, of physicians in CE images. The proposed model outperforms other approaches for visual saliency estimation based on physicians' eye fixation by providing an AUC-J of 0.726 among CE images depicting various pathological and normal cases.