Abstract
Protein function prediction is one of the most challenging problems in the post-genomic era. The number of newly identified proteins has been exponentially increasing with the advances of the high-throughput techniques. However, the functional characterization of these new proteins was not incremented in the same proportion. To fill this gap, a large number of computational methods have been proposed in the literature. Early approaches have explored homology relationships to associate known functions to the newly discovered proteins. Nevertheless, these approaches tend to fail when a new protein is considerably different (divergent) from previously known ones. Accordingly, more accurate approaches, that use expressive data representation and explore sophisticate computational techniques are required. Regarding these points, this review provides a comprehensible description of machine learning approaches that are currently applied to protein function prediction problems. We start by defining several problems enrolled in understanding protein function aspects, and describing how machine learning can be applied to these problems. We aim to expose, in a systematical framework, the role of these techniques in protein function inference, sometimes difficult to follow up due to the rapid evolvement of the field. With this purpose in mind, we highlight the most representative contributions, the recent advancements, and provide an insightful categorization and classification of machine learning methods in functional proteomics.
Keywords: protein, gene ontology, machine learning, classification, pattern recognition, high-throughput techniques.