Movements of the hands are among the most natural means humans use to express information. Automated recognition of hand movements is a very active research domain for developing Human-Machine Interfaces. The surface electromyographic (sEMG) signal is a versatile and accurate data source for intuitive control of machines, robots, or prostheses based on gesture classification, which is a non-trivial mapping. Algorithms for Blind Source Separation (BSS) can retrieve the sEMG’s Motor Unit signals, which are the originary format of physiological information, and can be forwarded to a Machine Learning classifier. However, implementation of BSS algorithms for execution on resource-constrained hardware is still in its infancy. In this work, we propose a novel, parallelized version of the FastICA BSS method ported on the Mr. Wolf microcontroller based on PULP, achieving latency < 50ms and energy consumption < 1mJ. In an end-to-end approach, we fed the reconstructed neural signals to an SVM and an MLP classifier, obtaining accuracy >92% on 5 classes (rest and 4 gestures). These results prove that our setup is suitable for running in real-time on the limited resources of embedded hardware, while guaranteeing the same accuracy as black-box state-of-the-art solutions lacking any physiological insight.