Visual grounding aims at localizing objects in images using natural language expressions. This task can be challenging when there are significant differences between the distributions of the training and testing sets. Existing methods tend to excessively focus on the training sets, which could lead to overfitting, especially in small-sample scenarios. To address this issue, in this letter, we present a novel meta-learning-based training framework called MetaVG, for visual grounding. Our approach leverages bi-level optimization to adapt quickly to the target task, thereby alleviating the overfitting issue. To train MetaVG effectively, we propose a novel training mechanism called Random Uncorrelated Meta-training (RUM). This mechanism proposes to randomly load uncorrelated batches as support and query sets respectively in the data separation process, then utilize bi-level optimization to directly train the model on visual grounding datasets. Comprehensive experiments on four widely used datasets, as well as in small-sample scenarios, validate the efficacy of MetaVG.