The combination of convolution and Transformer applied to medical image segmentation has achieved great success. However, it still cannot reach extremely accurate segmentation on complex and low-contrast anatomical structures under lower calculation. To solve this problem, we propose ECT-NAS method to automatically search Efficient CNN-Transformers architecture for medical image segmentation, which featured with multi-scale search space. To better extract the global context in the search space of ECT-NAS, we carefully design a light transformer with local-global attention. Last, we proposed an efficient resource constrained search strategy that simultaneously optimizes the accuracy and cost (Params/FLOP) of the model. We evaluate ECT-NAS by conducting extensive experiments on synapse multi-organ, Chaos and ACDC datasets, showing that this approach achieves competitive performance over other segmentation methods, with fewer parameters and lower FLOPs.