The latest work in the field of deep reinforcement learning speaks highly about the advanced exploration techniques which avoid the greedy decisions of agents. Usually, reinforcement learning works by finding the optimal policy for a Markov Decision Process. In off-policy algorithms the agent learns a value function for this optimal policy, separate of the action choice, an example being the deep Q-learning algorithm. Algorithms based on a maximum entropy framework, like soft Q-learning, overcome the greedy behavior of the agent, effectively combining exploration and exploitation by adding an entropy term to the Bellman equation. This method, applied to the Lunar Lander environment, was compared to the classic deep Q-learning, using the same set of different random seeds and averaging multiple runs. An implicit exploration strategy proves to compensate for disturbances caused by intrinsic sources of non-determinism, such as random seeds. This paper highlights the sensitivity to intrinsic and extrinsic influences for deep reinforcement learning, with respect to exploration and repeatability.