Localization is one of the fundamental technologies to enable location-aware services in smart cities. Image feature matching plays an essential role in visual-based localization for moving IoT devices to navigate in various scenes. Conventional matching pipelines have issues with finding an accurate transformation model when a pair of to-be-matched images have a huge scale difference between the views of interested objects. The paper introduces DynaScale, a general framework that integrates existing image detection and matching algorithms with constructed image pyramids for extended matching, and selects the best matching result from the image pairs of different scales. We have designed an intelligent evaluation scheme of potential transformation models based on various tests including reasonable projections, the resembled size of region proposals, and similarities of bounded descriptors. The experimental result shows that, on selected datasets, DynaScale has 1.9 times better mean matching accuracy than existing methods from 24.32% to 45.91%, and produces about twice the number of useful, correctly matched frames in a moving robot’s video stream.