Submit Manuscript  

Article Details

Computational Method for Distinguishing Lysine Acetylation, Sumoylation, and Ubiquitination using the Random Forest Algorithm with a Feature Selection Procedure

[ Vol. 20 , Issue. 10 ]


ShaoPeng Wang, JiaRui Li, Fei Yuan, Lei Chen, Tao Huang and Yu-Dong Cai*   Pages 886 - 895 ( 10 )


Background: The post-translational modifications (PTMs) on the side chains of conserved lysine (Lys) residues play important roles in myriad cellular processes, such as modification of the structures and activities of histones, protein degradation and turnover, and the regulation of DNA damage responses. To date, several computational methods have been developed to identify different PTMs on Lys residues. However, most of these methods focused on identifying one particular PTM regardless of other types of PTMs.

Method: In this study, we first conducted a computational investigation of three types of PTMs (acetylation, sumoylation, and ubiquitination) at the same time by analyzing the protein structure and sequence factors surrounding the substrate Lysresidues in these types of PTMs. To fully extract the structural and sequence information around the Lysresidues, six types of features were used to encode the peptide segments containing the substrates. Next, through a feature selection method, i.e., maximum relevance minimum redundancy (mRMR), two feature lists, i.e., MaxRel feature list and mRMR feature list, were obtained. For the mRMR feature list, it was applied to extract the optimal features of the random forest algorithm for distinguishing three types of PTMs.

Results: An optimal classification model with an overall accuracy of 0.989 was built. For the MaxRel feature list, we investigated the top-ranked features to uncover the site-preference and residue-preference of Lys residues.

Conclusion: The results suggested that the disorder structure and the preference of flanking residues were the most important attributes to distinguish the three types of PTMs, which were consistent with the results reported in previous studies.


Post-translational modification, acetylation, sumoylation, ubiquitination, maximum relevance minimum redundancy, random forest, disordered region in protein.


College of Life Science, Shanghai University, Shanghai 200444, College of Life Science, Shanghai University, Shanghai 200444, Department of Science & Technology, Binzhou Medical University Hospital, Binzhou 256603, Shandong, College of Information Engineering, Shanghai Maritime University, Shanghai 201306, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, College of Life Science, Shanghai University, Shanghai 200444

Read Full-Text article