Summary: With the growing availability of computational resources, the interest in learning models of dynamical systems has grown exponentially over the years across many diverse disciplines. As a result of this growth, objective functions for model estimation have been rapidly developed independently across fields such as fluids, control, and machine learning. Theoretical justifications for these objectives, however, have lagged behind. In this dissertation, we provide a unifying theoretical framework for some of the most popular of these objectives, specifically dynamic mode decomposition (DMD), single rollout Markov parameter estimation, sparse identification of nonlinear dynamics (SINDy), and multiple shooting.In this framework, we model a general dynamical system using a hidden Markov model and derive a marginal likelihood that can be used for estimation. The key difference between this and most existing likelihood estimators is that rather than simply modeling the estimation error in the output of the system, we additionally model the error in the dynamics through the inclusion of a process noise term. Not only does this process noise term provide the flexibility needed to generalize many existing objectives, but it also provides three significant advantages in the marginal likelihood. The first is that it generates an explicit regularization term that arises directly from the model formulation without the need for adding heuristic priors onto the parameters. Furthermore, this regularization term is over the output, rather than the parameters, of the model and is therefore applicable to any arbitrary parameterization of the dynamics. Secondly, the process noise term provides smoothing of the marginal likelihood optimization surface without having to discount the information in the data through tempering methods or abbreviated simulation lengths. Lastly, estimation of the process noise term can give a quantification of the uncertainty of the estimated model without necessarily requiring expensive Markov chain Monte Carlo (MCMC) sampling.To evaluate this proposed marginal likelihood, we present an efficient recursive algorithm for linear-Gaussian models and an approximation to this algorithm for all remaining models. We discuss how simplifications to the approximate algorithm can be made when the noise is additive Gaussian and derive simplifications for when it is arbitrary additive/multiplicative noise. Next, we provide theoretical results proving that the considered objectives are special cases of a posterior that uses the proposed marginal likelihood. These results uncover the sets of assumptions needed to transform the negative log posterior into each of the objective functions that we consider. We then present numerical experiments that compare the (approximate) marginal likelihood to each of the considered objectives on a variety of systems. These experiments include linear, chaotic, partial differential equation, limit cycle, and Hamiltonian systems. Additionally, we include a novel comparison of Hamiltonian estimation using symplectic and non-symplectic dynamics propagators. This comparison uses uncertainty quantification both in the form of MCMC sampling and process noise covariance estimation to show that embedding the symplectic propagator into the objective delivers more precise estimates than embedding the objective with the non-symplectic propagator. Overall, the results of this dissertation demonstrate that the marginal likelihood is able to produce more accurate estimates on problems with high amounts of uncertainty in the forms of measurement noise, measurement sparsity, and model expressiveness than comparable objective functions.