In computer vision area human pose estimation is one of the key challenging techniques for a variety of applications including virtual reality, video games, sports and movies. Researches on pose estimation have been increasingly popular for the last several decades as the applications become essential in modern human life. Traditional approaches generally utilize well-known classifiers such as support vector machines(SVM), Adaboosts or multilayer perceptrons(MLP) with a set of handcrafted features as inputs. Discriminative features such as histogram of gradients(HoG) are extracted from images representing object classes and a classifier assigns a class to the feature vector provided. Recent advancements in deep learning, especially convolutional neural nets(CNN), have greatly impacted on post estimation systems and improved the estimation accuracy significantly. New approaches using CNNs have a capability of automatically learning feature extraction filters and classification parameters from training samples and estimating locations of joints for test samples with a higher accuracy. A CNN for pose estimation tends to estimate positions of joints more accurately as the number of layers is larger. As more layers are used for a CNN, however, the estimation speed becomes much slower and then such an estimation system may not be applied to real-time applications.In this thesis, we attempt to design and implement a pose estimation system which can estimate poses in real time while keeping the accuracy comparable to the state-of-the-art research results. Our proposed system employs a multi-stage CNN with a large number of layers with relatively small number of network parameters. It uses inception modules and 1x1 convolutions to factorize large-sized convolutions into smaller ones and hence to reduce the computational complexity while context information from multi-staged CNNs may contribute to achieve the high accuracy. LSP data set and MPII Human Pose data set are used in the experiment. Our proposed system showed 137% improvement on inference speed (102 ms over 242 ms). Estimation accuracy of the proposed system is 89.92% and almost identical to the performance of 90.5% of the best existing system.