In recent years, a contentious debate has emerged surrounding the degree to which Large Language Models (LLMs) can truly achieve grounding in the physical world. Grounding, in this context, refers to the models’ ability to establish a meaningful connection between their language-based understanding and a concrete comprehension of real-world phenomena. Our research aims to explore the latent capability of LLMs to develop physical intuition: a prerequisite for embodied agents to effectively perform tasks in real-world environments. In this paper, we release a novel dataset of physical scenarios that serve as a benchmark for an LLMs’ physical intuition. Our benchmark AuPPLE (Augmented Physical Priors through Language Enhancement) for Language Models includes scenarios regarding free-fall and projectile motion, including various question-answer formulations: MultiQA, binary classification, and continuous number prediction to comprehend linguistic nuances and apply their understanding within a physical context. By meticulously fine-tuning LLMs on this specialized dataset, we assess their performance in providing responses that showcase an ability to draw upon underlying physical knowledge. With our fine-tuned LLMs achieving over 87%—more than 3 times its base model—on free-fall evaluation dataset, our results shed light on the intrinsic grounding capabilities of LLMs, offering insights into their potential to bridge the gap between language and the physical world. This paper contributes to the ongoing discourse on the true nature of LLMs’ comprehension and its relationship with real-world context, underscoring the strides made in enhancing their intuitive understanding through targeted fine-tuning techniques.