The lack of diverse conversations with emotion labels leads to low performance of emotion recognition in conversation in the target domain with large distribution differences. Unsupervised domain adaptation (UDA) is suitable in practice, which aims to train models using labeled data from the source domain and adapt the knowledge to the target domain where data is unlabeled. However, previous works in the UDA field directly match different distributions, which ignore the offset of bases leading to negative transfer of part of samples. Besides, CNN widely used in previous works is not suitable for Emotion Recognition in Conversation (ERC). In this study, we propose a two-step unsupervised domain adaptation framework (MTDA) for the inference of emotional state in conversation in a target domain that has no labels. Specifically, we first propose a basis alignment method, called greedy orthogonal transform(GOT), to approxi-matively align domain distributions via correcting bases. Furthermore, we propose a novel deep multitask network based on attention to learn domain agnostic feature representations. Alignment techniques such as maximum mean discrepancy (MMD) can be easily integrated with our framework. We verify our proposal at the cross-corpus setting between IEMOCAP and MELD datasets. Experiments demonstrate that our framework achieves state-of-the-art performance.