Echosounders are used by fisheries and ocean observatories, but significant manual effort is required to classify species of interest within multifrequency echograms. This article investigates the use of modified U-Net convolutional neural networks for the pixel-level classification of biological and physical data in echogram images with accurate classification of herring and salmon schools, bubbles, and the sea surface. Data were collected on the coast of British Columbia, Canada, over two years using an Acoustic Zooplankton and Fish Profiler at four frequencies (67, 125, 200, 455 kHz). In addition, simulated data (water depth and solar elevation angle) provide spatial and temporal context to improve the quality of predictions. Redundancy is built into the model by using a tiling strategy during training and classification. During training, using a limited set of annotated data, translational augmentation encodes the U-Nets with robust features that enable applications for alternate deployment configurations (lower sampling rates or alternate water depths). To ensure broad applicability, these networks were trained to classify echograms with noise left intact. The best-performing model classifies herring, salmon, and bubble classes with $\rm {F_{1}}$ scores of 93.0%, 87.3%, and 86.5%, respectively. The results are accurate even when multiple classes are in close proximity, thus, retaining biological data that would otherwise be discarded due to surface bubble noise.