This work proposes a deep learning framework for buildings detection. A two-stream convolutional neural network is trained on RGB (aerial imagery) and ALS LiDAR datasets where three setups were explored. The first setup involved only using the RGB imagery, the second only ALS LiDAR and the third setup using both datasets. Evaluations were performed based on metrics derived from the confusion matrix, namely overall accuracy, kappa coefficient, user accuracy, and producer accuracy. We found that fusing both datasets was superior with an overall accuracy and kappa index of 94.36% and 0.819, respectively.