Cell-free (CF) massive MIMO is considered one of the key technologies for 6G to achieve high spectral efficiency (SE) and ultralow latency. However, as the number of users increases, pilot contamination becomes more serious, and the optimal SE can not be achieved when the number of users exceeds the access points (APs). Therefore, we study the CF massive MIMO-NOMA system. Specifically, we design a user clustering algorithm based on the average Signal to Interference plus Noise Ratio (SINR), using orthogonal pilots between different clusters, and different users in the cluster using the same pilot, thereby reducing pilot contamination. Then we propose a flexible power allocation problem to maximize the system SE while taking into account user fairness. We model the problem as a Markov Decision Process (MDP) and then solve it using the asynchronous advantage actor-critic (A3C) algorithm in deep reinforcement learning. Simulation results show that the proposed A3C based power allocation scheme in CF massive MIMO-NOMA outperforms the baseline schemes in terms of fairness and SE.