We present a fast and accurate center-based, single-pass clustering method, with main focus on improving the trade-off between accuracy and speed in computer vision problems, such as creating visual vocabularies. We use a stochastic mean-shift procedure to seek the local density peaks within a single pass of the data. We also present a dynamic kernel generation along with a density test procedure that finds the most promising kernel initializations. In our algorithm, we use two data structures, namely a dictionary of permanent kernels, and a short memory that is used to determine emerging kernels to be maintained and outliers to be discarded. We further develop a hierarchical realization of the algorithm such that it executes faster than most stream cluster algorithms, and the resulting tree serves as an efficient data structure to boost encoding procedure. We use our method for learning low-level image features on the fly, from unmanned aerial vehicle cameras, aiming at vision-driven maneuvering with reduced computational cost. In our experiments, we make extensive comparisons with popular clustering algorithms, with respect to accuracy and efficiency. Our algorithm showed improved accuracy and speed on datasets with sufficient cluster patterns. With noisy visual features, where natural clusters present inherent challenges (intra-cluster variability and inter-cluster similarities), we achieved high accuracy, compared to algorithms of higher complexity, while maintaining high efficiency: Only one method from the competition achieved lower run times, though with lower accuracy.