Embedded systems have been becoming increasingly complex over recent years, with performance becoming comparable to desktop computing systems. However, embedded systems need to adhere to greater design constraints (e.g., area and energy constraints) compared to desktop computing systems. Architectural specialization is a technique that can aid in meeting the stringent design constraints by introducing configurable hardware that can be tuned at runtime to optimize a goal (e.g., minimum energy, minimum execution time) for an application. However, traditional approaches (i.e., exhaustive, heuristic searches) often take considerable time to search a large design composed of different configurable parameters (e.g., cache size, associativity, etc.) and parameter values (e.g., 4 kB, 2-way, etc.). In addition, the presence of application phases (i.e., repeating execution behavior) allow for finer-grained tuning at the cost of even greater exploration overhead. In this paper, we apply machine learning to reduce/eliminate the design space exploration overhead associated with finding the best set of configurable parameters for configurable L1 instruction and data caches. Our prediction methodology consists of artificial neural networks (ANNs) which take the execution statistics of an application phase as input and outputs a best cache configuration (i.e., combination of configurable parameter values) for the instruction and data caches). Our results show that we can achieve an average energy degradation of less than 5% for the instruction and data caches with an average of 20% phase misclassification percentage and 20% less cache switches than the case where the best cache configuration is chosen for every application phase.