In the expanding arena of action recognition, accurately discerning complex user actions in unfamiliar contexts remains a challenging endeavor due to the ambiguity of action boundaries and potential multiple simultaneous actions, all low-ering recognition precision. We address these issues by proposing a tripartite framework leveraging information sharing across tasks like hand-object pose estimation and grasp classification. Initially, our framework employs ResNet34 as backbone for hand-object pose estimation and object class detection, laying the groundwork for action recognition without temporal dependencies. The subsequent stage employs a simple Grasp Classifier, using labeled data from the first stage, ensuring information continuity. The final stage deploys an Action Transformer for action recognition, utilizing pose and grasp data from earlier stages. The latter stages operate temporally, integrating temporal information and relationships between hand-object patterns, thereby enhancing action recognition. Our approach's efficacy, as evaluated on the HO-3D dataset and compared with the H+oapproach, demonstrates notable performance improvement, affirming the beneficial role of multi-task information sharing in action recognition.