Without a proper observation of the energy demand of the receiving terminals, the retailer may be obliged to purchase additional energy from the real-time market and may take the risk of losing profit. This paper proposes two combinatorial multi-armed bandit (CMAB) strategies in green cloud radio access network (C-RAN) with simultaneous wireless information and power transfer under the assumption that no initial knowledge of forthcoming energy demand and renewable energy supply are known to the central processor. The aim of the proposed strategies is to find the set of optimal sizes of the energy packages to be purchased from the day-ahead market by observing the instantaneous energy demand and learning from the behaviour of cooperative energy trading, so that the total cost of the retailer can be minimized. Two novel iterative algorithms, namely, ForCMAB energy trading and RevCMAB energy trading are introduced to search for the optimal set of energy packages in ascending and descending order of package sizes, respectively. Simulation results indicate that CMAB approach in our proposed strategies offers the significant advantage in terms of reducing overall energy cost of the retailer, as compared to other schemes without learning-based optimization.