We conducted empirical experiments to assess the transferability of a light curve transformer to datasets with different cadences and magnitude distributions using various positional encodings (PEs). We proposed a new approach to incorporate the temporal information directly to the output of the last attention layer. Our results indicated that using trainable PEs lead to significant improvements in the transformer performances and training times. Our proposed PE on attention can be trained faster than the traditional non-trainable PE transformer while achieving competitive results when transfered to other datasets.
Comment: In Proceedings of the 40th International Conference on Machine Learning (ICML), Workshop on Machine Learning for Astrophysics, PMLR 202, 2023, Honolulu, Hawaii, USA