Camera-based Bird'View (BEV) 3D object detection is thoroughly challenging and essential in autonomous driving perception system for alleviating the impression caused by object overlap and occlusion effectively. 3D multi-object tracking is one of the most important perception tasks in autonomous driving system, which suffers from the narrow view provided by only one single camera. In this paper, we present a novel framework called BEVMOT, which combines multi-object detection and tracking task in a unified framework in a considerable inference speed. The encoder of our framework generates the 360 degree panoramic image around the ego-car in the bird's eyes view utilizing the multiple cameras equipped on the car to broaden the perception field. An efficient decoder composes of multi-head attention and deformable attention following by a multi-object detection and tracking head, which learn the object center point and tracking embedding for acquiring object boxes and tracking id directly without the NMS post-processing. Meanwhile, tracking branch utilize tracking embedding to initialize new trajectories, update existing trajectories and achieve data association frame-by-frame. Extensive experiments show that our approach represents wonderful performance on NuScenes datasets, which exceeds many classic methods in term of AMOTA and AMOTP metrics.