In the context of Industry 4.0, the significance of automated manufacturing and related supply chain automation cannot be overstated. In many cases, unbalanced production systems pose a challenge to the management of an automated supply chain. To operate the supply chain efficiently, it is essential to understand the states of the production system that directly impact the material flow schedule. One possible approach involves applying the Markov Decision Process (MDP) model, which requires knowledge of the system states, the actions that lead to state transitions, and the current state of the system, which must be associated with rewards. The presentation illustrates the development of an MDP model to control an automated guided vehicle between automated production cells, using a case study. The developed model provides a system-wide framework for reinforcement learning (RL) based optimization.