Zhe Yang, Juan Wang*, Jia Yang, Zhi Qi and Jiahao He Pages 1 - 9 ( 9 )
Recognition for proteins is essential for study of biology. In order to obtain the function proteins of Elymus nutans, we sequenced their transcriptomes in Inner Mongolia of China. Then, we used BLAST software for their function annotations. Besides, we used machine learning methods to recognize proteins which are not annotated by the software. In the process, we focused on identify the proteins with binding functions. In our research, features are extracted by four algorithms and selected by mutual information estimator. Meanwhile, a total of three types of classifiers are constructed based on K-nearest neighbor algorithm and gradient boosting algorithm. Results show that there are 848 proteins with ATP binding function, 113 proteins with heme binding function, 315 proteins with zinc-ion binding function, 135 proteins with GTP binding function and 21 proteins with ADP binding function. Furthermore, we have successfully predicted the functions of 10 special protein sequences whose function annotations cannot be obtained by making sequence alignment with seven famous protein databases. Among them, seven sequences have ATP binding functions, one sequence has heme binding function, one sequence has zinc-ion binding function and the other one has GTP binding function.
Protein, binding function, machine learning, feature, ATP, GTP.
School of Computer Science, Inner Mongolia University, Hohhot, Inner Mongolia 010021, School of Computer Science, Inner Mongolia University, Hohhot, Inner Mongolia 010021, Stage Key Laboratory of Reproductive Regulation & Breeding of Grassland Livestock, Hohhot, Inner Mongolia 010021, Stage Key Laboratory of Reproductive Regulation & Breeding of Grassland Livestock, Hohhot, Inner Mongolia 010021, Class 1, 2018, International Department, Hohhot No.2 High School, Hohhot, Inner Mongolia 010021