How to identify the same commodity entity from the multi-source heterogeneous, autonomous, independent, diverse, and inconsistent electronic commerce data is the main challenge for the present. By analyzing the data characteristics of different platforms, an index model based on commodity attribute/value is established firstly, and then we construct the global pattern of attribute value of commodity. It comes into being a unified model, quality and efficiency of commodity information data. Then based on the hierarchical probability model, the similarity of the identity of the goods is measured. We finished commodity entity recognition. And the normalized output is to meet the same set of commodities and sort. We construct experiments based on the Hadoop platform for the 3 B2C e-commerce data sources. And the traditional methods and products are compared. The experimental results demonstrate the feasibility, accuracy and efficiency of the framework.