Emotion recognition in conversation is of interest due to its wide range of application domains, and its goal is to recognize the emotions of the utterance in a given conversation. In order to better understand the utterance information and achieve accurate emotion classification, this task requires consideration of hierarchical information and commonsense knowledge. However, previous studies have less considered the introduction of commonsense knowledge into the model and are deficient in integrating commonsense and hierarchical information. To address the problem, we propose a conversation emotion recognition model based on hierarchical networks and commonsense knowledge, aiming to combine hierarchical information with commonsense knowledge in order to explore emotion information more fully. First, we perform feature extraction on both contextual and utterance hierarchies to fully capture semantic features, based on which we propose a new attention mechanism for capturing deeper information in the utterance hierarchy. Secondly, we enrich the semantics by acquiring external knowledge with the help of commonsense knowledge. Finally, we used a hierarchical fusion module to effectively fuse different levels of information and commonsense knowledge. Extensive experiments on both datasets confirm the validity of our model.