Numerical models play important roles in power system operation. They are widely used for planning studies to identify and mitigate issues, determine transfer capability, and develop transmission reinforcement plans. These models need to be accurate and updated regularly to serve these purposes faithfully over time. In this paper, we formulate the problem of parameter calibration for machine models in a power system into the framework of reinforcement learning and demonstrate the feasibility of applying Deep Deterministic Policy Gradient (DDPG) for a two-parameter generator model calibration on a 4-bus system. To improve the efficiency and accuracy of DDPG, we introduce memory forgetting mechanism and dynamic range adjustment (DRA) into the original DDPG, i.e., DRA-DDPG. To reduce the parameter estimation errors due to partially observable disturbance states in the power system, we introduce the concept of maximal K-Nearest-Neighbor (KNN) reward to enable our reinforcement learning algorithm to accommodate a finite set (K) of unknown disturbance states in the system. Our experimental results show that the proposed DRA-DDPG outperforms the baseline DDPG in terms of accuracy and efficiency and the proposed maximal KNN reward is well-suited for resolving the uncertainties from partially observable system states.