Application-specific hardware acceleration of computation-intensive kernels can often provide significant performance and power efficiency improvements over general-purpose software, but it is difficult and costly to incorporate them into existing software systems. Designing hardware accelerators and modifying legacy software to incorporate them are already complex tasks. Furthermore, identifying a suitable kernel for acceleration is complicated by the PCIe-attached architecture of commodity FPGA cards, meaning bandwidth and latency overhead must be considered for kernel selection. For example, small blocking kernels may not benefit from acceleration due to PCIe latency. As a remedy, we present FarSlayer, a high-level source-to-source compiler for end-to-end acceleration of legacy software. FarSlayer analyzes existing software code and emits an accelerated version of it, where the kernel is automatically selected considering data movement over PCIe. Specifically, FarSlayer identifies kernels which can be called asynchronously to hide the PCIe latency, while also having a high operational intensity for low bandwidth requirements. The entire process is automatic, meaning the programmer does not necessarily need to understand the existing code, or reason about hardware development. We demonstrate FarSlayer on multiple existing scientific computing software systems, and demonstrate it can automatically achieve significant performance improvements.