We give an overview on optimal circuits to implement linear permutations on FPGAs using only RAM banks and switches. Linear means that the permutation maps linearly the bit representation of the indices, as it is the case with most permutations arising in digital signal processing algorithms including those in fast Fourier transforms, Viterbi decoders, and sorting networks. Additionally, we assume that the data to be permuted is streamed, i.e., input in chunks over several cycles. The circuits are obtained from a suitable factorization of the bit matrix representing the permutation and achieve the minimal number of switches possible.