This paper explores the benefits of combining a U-shaped disassembly line with a single-row linear disassembly line for specific scenarios. To address the balancing problem that arises with such a hybrid disassembly line, the authors establish a mathe-matical model aimed at maximizing recovery profit. The Soft Actor-Critic (SAC) algorithm is proposed to find the solution, taking into account the characteristics of the problem. The performance of the SAC algorithm is compared to the Advantage Actor-Critic (A2C) algorithm, Deep Deterministic Policy Gradient (DDPG). The results demonstrate that the SAC algorithm is capable of achieving an approximately optimal result for small-scale cases and outperforms DDPG, A2C in solving large-scale disassembly cases.