In this paper, we revisit lightweight FPGA implementations for SHA-3 and improve upon the state of the art by applying a new optimization technique to the slice-oriented architecture, which is based on a shallow pipeline. As a result, the area for the implementation reduces by almost one quarter (23%), compared to the up to now smallest implementation for Virtex-5 FPGAs. The proposed design also improves on the throughput-area ratio by 59%. For Virtex-6 FPGAs, the improvements are even higher, showing a throughput-area ratio increase by over 150% upon previously reported results for this FPGA. Furthermore, we evaluate several additional implementation trade-offs. First, we provide the maximum number of pipeline stages for lightweight architectures, which process several slices in parallel and for variants of SHA-3 with only 800 and 400 bits of internal state. Second, we evaluate several hardware interfaces. This evaluation shows, that the hardware interface may have a significant impact on the area consumption and the throughput.