Neural Theorem Provers (NTPs) are neuro-symbolic models that combine deep learning with a system of logic. They can learn representations for data, induce rules, are naturally interpretable, come with built-in explanations for conclusions, and demonstrate the capacity for systematic generalization. However, since they consider all possible rules for proving a goal, they suffer from high computational complexity, and are thus unsuitable for use on large or complex datasets. Conditional Theorem Provers (CTPs) are proposed as an extension to address this issue. Nonetheless, CTPs suffer from similar computational constraints, as they still consider multiple proof paths while reasoning. We propose RL-CTPs, where CTPs are augmented with reinforcement learning to select the proof paths most likely to succeed. This allows the model designer to specify the number of proof paths to consider, to conform to the computational constraints of their use case, while retaining all of the benefits of CTPs. Using the CLUTRR dataset to perform evaluations, we provide evidence for the computational issues in existing CTP models, show that RL-CTPs alleviate these issues, and demonstrate that, in certain scenarios, the accuracy achieved by RL-CTPs is higher than CTPs with equivalent computational complexity.