K-nearest neighbor classification algorithm can quickly deal with the classification problem in this paper, but when calculating the similarity, it will assign the same weight to all distances, and does not pay attention to the impact of small distance on classification accuracy. At the same time, the k-nearest neighbor classification algorithm will be affected by the number of samples and dimensions, which will affect the efficiency of the classification algorithm. Therefore, an improved weighted KNN classification algorithm based on spark framework is proposed, which can improve the operation efficiency of the algorithm by cutting and reducing the dimension of sample data. Experimental results show that the algorithm has better accuracy and speedup ratio than the parallel algorithm based on Hadoop platform, and can process large-scale text data quickly and accurately.