ShaoPeng Wang, Yu-Hang Zhang, Ning Zhang, Lei Chen, Tao Huang and Yu-Dong Cai Pages 1 - 12 ( 12 )
Lantibiotics, which are usually produced from Gram-positive bacteria, are regarded as one type of special bacteriocins. Lantibiotics have unsaturated amino acid residues formed by lanthionine (Lan) and β-methyllanthionine (MeLan) residues as a ring structure in the peptide. They are derived from the serine and threonine residues and are essential to preventing the growth of other similar strains. In this pioneering work, we firstly proposed a machine learning method to recognize and predict the Lan and MeLan residues in the protein sequences of lantibiotics. We adopted maximal relevance minimal redundancy (mRMR) and incremental feature selection (IFS) to select optimal features and random forest (RF) to build classifiers determining the Lan and MeLan residues. A 10-fold cross-validation test was performed on the classifiers to evaluate their predicted performances. As a result, the Matthew’s correlation coefficient (MCC) values for predicting the Lan and MeLan residues were 0.813 and 0.769, respectively. Our constructed RF classifiers were shown to have a reliable ability to recognize Lan and MeLan residues from lantibiotic sequences. Furthermore, three other methods, Dagging, the nearest neighbor algorithm (NNA) and sequential minimal optimization (SMO) were also utilized to build classifiers to predict Lan and MeLan residues for comparison. Analysis was also performed on the optimal features, and the relationships between the optimal features and their biological importance were provided. We believe the selected optimal features and analysis in this work will contribute to a better understanding of the sequence and structural features around the Lan and MeLan residues. It could provide useful information and practical suggestions for experimental and computational methods toward exploring the biological features of such special residues in lantibiotics.
post-translational modifications, lantibiotics, lanthionine, β-methyllanthionine, random forest, maximal relevance minimal redundancy
Shanghai University - College of Life Science Shanghai, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, Department of Biomedical Engineering, Tianjin key Lab of Biomedical Engineering Measurement, Tianjin University, Tianjin, College of Information Engineering, Shanghai Maritime University, Shanghai 201306, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, Shanghai University, Shanghai 200444