There are many deep learning (DL)-based models with the contact sensors (e.g., throat microphone, TM) to reconstruct the speech from the vibration signals of the larynx. The TM can obtain robust speech information than an air-conducted microphone (ACM) sensor in noisy environments. However, it needs tight contact with the user's skin, which causes discomfort for users. Therefore, we assume that a non-contact sensor allows users to have a better experience. Following this concept, the DL-based models with a non-contact sensor, a laser-Doppler vibrometer (LDV), are proposed to reconstruct the speech from the vibration signals of the larynx. Notably, the recognition and speech synthesis modules were adopted in the proposed system. The experimental results showed that, on average, the word error rate (WER) of the recognition module in the proposed system achieves similar performance as TM did in both quiet and noisy testing conditions. Furthermore, the listening test showed that the synthesis module's reconstructed speech provided a higher preference rate and naturalness than an original recorded speech of the LDV sensor. These results suggested that the proposed system is a potential approach to reconstruct speech from the vibration signals of the larynx with DL technology, captured by a non-contact LDV sensor.