Spoken commands promise unique advantages for the control of industrial machinery. Operators are enabled to keep their eyes on safety critical aspects of the process at all times and are free to use their hands in other parts of the process, instead of remote control. Current keyword spotting systems are prone to misunderstanding spoken utterances, especially in noisy environments, and are commonly deployed as non-realtime cloud services. Consequently, these systems can not be trusted with safety critical industrial control. We adapt a DS-CNN and a CNN for keyword spotting and use augmented training data, including real industrial noise, to increase their robustness. Furthermore, we apply post-training quantization and analyze the performance of both networks using multiple embedded systems, including a Google Edge TPU. We carry out a systematic analysis of accuracies, memory footprint and inference times using different combinations of data augmentations, hardware platforms, and quantizations. We show that augmented training data increases the inference accuracy in noisy environments by up to 20 %. Among others, this is demonstrated using an integer quantized network with a memory footprint of 0.57 MByte, reaching inference speeds of less than 5 ms on an embedded CPU and less than 1 ms on the Edge TPU. The results show that keyword spotting for industrial control is feasible on embedded systems and that the training data augmentation has a significant impact on the robustness in challenging environments.