The study of the thermodynamics, kinetics, and microscopic mechanisms of chemical reactions in solution requires the use of advanced free-energy methods for predictions to be quantitative. This task is however a formidable one for atomistic simulation methods, as the cost of quantum-based ab initioapproaches, to obtain statistically meaningful samplings of the relevant chemical spaces and networks, becomes exceedingly heavy. In this work, we critically assess the optimal structure and minimal size of an ab initiotraining set able to lead to accurate free-energy profiles sampled with neural network potentials. The results allow one to propose an ab initioprotocol where the ad hocinclusion of a machine-learning (ML)-based task can significantly increase the computational efficiency, while keeping the ab initioaccuracy and, at the same time, avoiding some of the notorious extrapolation risks in typical atomistic ML approaches. We focus on two representative, and computationally challenging, reaction steps of the classic Strecker-cyanohydrin mechanism for glycine synthesis in water solution, where the main precursors are formaldehyde and hydrogen cyanide. We demonstrate that indistinguishable ab initioquality results are obtained, thanks to the ML subprotocol, at about 1 order of magnitude less of computational load.