Markov decision processes (MDP) are probabilistic models widely used in various areas, such as control theory, game theory, machine learning and robotics. Recent years have seen a surge of research interest in the control of Markov decision process (MDP) under temporal logic specifications. Existing methods, such as abstraction-based approach and receding horizon approach, are hard to extend to MDP with continuous states, need a precise knowledge of the model, and are computationally demanding due to the curse of dimensionality. In this letter, we propose a randomized controller design algorithm for continuous state MDP with unknown transition probabilities with respect to signal temporal logic (STL) specifications. Our basic idea is to convert the controller design into an optimization problem with its robustness index as a cost function, where the optimal control policy corresponds to the optimal solution follow a probabilistic distribution. Sampling approach is employed to asymptotically approximate the optimal distribution. The convergence property is formally proved with an estimate of the convergence rate. Numerical example is given to illustrate the effectiveness of the proposed method.