Large Block Matching Characters for Dehydrin Classification
- Resource Type
- Conference
- Authors
- Ashlock, Daniel; Gillis, Sierra; Saunders, Amanda; Riley, Andrew
- Source
- 2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2019 IEEE Conference on. :1-8 Jul, 2019
- Subject
- Bioengineering
Computing and Processing
DNA
Proteins
Image classification
Markov processes
Automata
Climate change
Databases
DNA classification
evolved features
representation
- Language
Dehydrins are a type of modular, disordered stress protein in plants. They are typically defined by the presence of three motifs called Y-, K-, and S-segments. Their disordered structure and relatively free sequence form make identifying dehydrins a difficult problem. Identification of stress proteins is part of the effort to make the food supply secure in the face of climate change. In this study we used a block-matching character based feature finding method on the do-what‘s-possible representation to distinguish dehydrins from synthetic sequences with the same fourth-order statistics. Good separation is achieved and sets the stage for attempting to locate additional dehydrins and dehydrin-like proteins. The minimum block matching size is found to be a critical parameter.