We receive sensory signals that are classified into three categories according to the type of receptors: extroception, interoception, and proprioception. Especially,the interoception is closely related to our emotion and decision-making. However, robots do not have interoception receptors like humans, and robots which have emotions like humans have not been realized. On the other hand, there are various sensors corresponding to extroception receptors, and integrated cognitive models that self-organize information from the external world have been discussed. In this study, we propose an integrated cognitive architecture considering interoception based on the architecture that is proposed by our research group. Using a robot with the proposed model, we conducted an experiment assuming interaction between a parent and a child. We then evaluated the latent space learned by the model. As a result, we confirmed that the robot can learn appropriate behaviors depending on the environment and physical information. Furthermore, we confirmed the formation of concepts according to the linguistic and the robot's physical information.