In this paper, we address the problem of finding the location of the camera based upon a query input RGB image for indoor navigation. This would be a difficult problem. Ever since the training data is gathered for the indoor positioning system, any type of modifications to the scene such as occlusions, illumination changes, or repetitive patterns can easily fool any positioning system. In this work, a tandem set of convolutional neural networks, have been leveraged to perform as the scene classifier. Moreover a scene RGB image is converted to its corresponding point cloud data through a GAN network. Finally, the position regression is performed over the point cloud input using a CNN structure. The proposed architecture has been compared with the related works and achieved a better performance in the sense that, 1) it simplifies the data generation, 2) it is more robust against small variations in the scene, and 3) the accuracy of the camera position, as well as its quaternion is remarkable.