In this paper, a resource allocation (RA) scheme based on multi-agent deep Q-learning network (MADQN) is proposed to solve the problems of slow convergence and limited step-by-step optimization performance for traditional algorithms in massive MIMO-NOMA systems. A tightly coupled iterative optimization structure for user grouping, power allocation and beamforming is established based MADQN. In order to achieve different performance requirements among users, with the goal of maximizing the weighted system sum rate, more than one reinforcement learning networks are used to learn user grouping and power allocation intelligently. In the overall iterative process, the RA results will be fed back to each DQN to calculate the reward function. The agents constrain each other and achieve an approximately ideal joint optimization effect. The simulation results verify the effectiveness of the MADQN scheme to improve the spectrum efficiency.