Species identification with partial DNA sequences has proved effective for different organisms. DNA barcode is a short genetic marker in an organism's DNA to identify which species it belongs to. In this work, we analyze the effectiveness of supervised machine learning methods to classify species with DNA barcode. We choose specimens from phylogenetically diverse species belonging to the animal, plant and fungus kingdoms. We consider the supervised machine learning methods, simple logistic function, random forest, PART, instance-based k-nearest neighbor, attribute-based classifier, and bagging. The analysis of results on various datasets shows that the classification performances of the selected methods are encouraging, and has an accuracy of 93.66% on average. This result shows 6% improvement compared to the state-of-art DNA barcode classification methods, which have 88.37% accuracy on average.
Notice of Violation of IEEE Publication Principles "Species Identification using Partial DNA Sequence: A Machine Learning Approach" by Tasnim Kabir, Abida Sanjana Shemonti, and Atif Hasan Rahman in 2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE), October 2018, 235-242 After careful and considered review of the content and authorship of this paper by a duly constituted expert committee, this paper has been found to be in violation of IEEE's Publication Principles. This paper copied text from the paper cited below. The original text was copied without attribution (including appropriate references to the original author(s) and/or paper title) and without permission. "Species Identification Using Part of DNA Sequence: Evidence from Machine Learning Algorithms" by Taha Alhersh, Brahim Belhaouari Samir, Hamada R. H. Al-Absi, Abdullah Alorainy and Belloui Bouzid in the Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies