Random data clusters classification is a popular problem in data processing field. However, lots of research towards this problem focus on solving them through two classical ideas: Least square, Gradient descent. But both of them have disadvantages not considered in relative research, they don’t make comparisons on accuracy between different solutions, what’s more they don’t explore further alternative ideas which could make solution better and flexible. Therefore, in this project, we aim to use three different methods: Least square, Gradient descent, Simulated annealing to solve random data clusters classification problem and make comparisons between their accuracy results. What’ more, we explore some other ways that could make Least square and Gradient descent more flexible. And all three methods are good solutions and satisfying requirement on accuracy, after comparison, the accuracy from high to low is Least square, Gradient descent, Simulated annealing. And due to avoiding disadvantages in previous two, Simulated annealing is a valuable method to solve this problem. And in Least square, the sigmoid activation function could be the best alternative to realise the same accuracy, and then the tanh activation function is also suitable for this method.