This paper presents application of Banditron − an online reinforcement learning algorithm (RL) in a discrete state intra-cortical Brain Machine Interface (iBMI) setting. We have analyzed two datasets from non-human primates (NHPs) − NHP A and NHP B each performing a 4-option discrete control task over a total of 8 days. Results show average improvements of ≍ 15%, 6% in NHP A and 15%, 21% in NHP B over state of the art algorithms − Hebbian Reinforcement Learning (HRL) and Attention Gated Reinforcement Learning (AGREL) respectively. Apart from yielding a superior decoding performance, Banditron is also the most computationally friendly as it requires two orders of magnitude less multiply-and-accumulate operations than HRL and AGREL. Furthermore, Banditron provides average improvements of at least 40%, 15% in NHPs A, B respectively compared to popularly employed supervised methods − LDA, SVM across test days. These results pave the way towards an alternate paradigm of temporally robust hardware friendly reinforcement learning based iBMIs.