Abstract
In light of the growing resistance toward current antiviral drugs, efforts to discover novel and effective antiviral therapeutic agents remain a pressing scientific effort. Antiviral peptides (AVPs) represent promising therapeutic agents due to their extraordinary advantages in terms of potency, efficacy and pharmacokinetic properties. The growing volume of newly discovered peptide sequences in the post-genomic era requires computational approaches for timely and accurate identification of AVPs. Machine learning (ML) methods such as random forest and support vector machine represent robust learning algorithms that are instrumental in successful peptide-based drug discovery. Therefore, this review summarizes the current state-of-the-art application of ML methods for identifying AVPs directly from the sequence information. We compare the efficiency of these methods in terms of the underlying characteristics of the dataset used along with feature encoding methods, ML algorithms, cross-validation methods and prediction performance. Finally, guidelines for the development of robust AVP models are also discussed. It is anticipated that this review will serve as a useful guide for the design and development of robust AVP and related therapeutic peptide predictors in the future.
Keywords: Therapeutic peptides, antiviral peptide, classification, machine learning, feature representation, feature selection.
[http://dx.doi.org/10.3762/bjoc.10.118] [PMID: 24991269]
[http://dx.doi.org/10.1038/nbt0812-733]
[http://dx.doi.org/10.1093/nar/gkv1278] [PMID: 26602694]
[http://dx.doi.org/10.1093/nar/gkr1063] [PMID: 22110032]
[http://dx.doi.org/10.1093/nar/gkv1051] [PMID: 26467475]
[http://dx.doi.org/10.1093/nar/gkt1191] [PMID: 24285301]
[http://dx.doi.org/10.1093/nar/gkv1174] [PMID: 26578581]
[http://dx.doi.org/10.1093/nar/gkv1114] [PMID: 26527728]
[http://dx.doi.org/10.1093/nar/gkx1157] [PMID: 29156005]
[http://dx.doi.org/10.1128/JVI.02027-14] [PMID: 25428870]
[http://dx.doi.org/10.1371/journal.pone.0070166] [PMID: 23940542]
[http://dx.doi.org/10.1016/j.compbiomed.2019.02.011] [PMID: 30802694]
[http://dx.doi.org/10.1093/nar/gks450] [PMID: 22638580]
[http://dx.doi.org/10.3390/ijms20225743] [PMID: 31731751]
[http://dx.doi.org/10.1093/bioinformatics/btz246] [PMID: 30994882]
[http://dx.doi.org/10.2174/1875036201509010013]
[http://dx.doi.org/10.1016/j.biotechadv.2018.01.004] [PMID: 29330093]
[http://dx.doi.org/10.1016/j.drudis.2009.10.007] [PMID: 19878736]
[http://dx.doi.org/10.1038/s41598-017-03745-2] [PMID: 28620219]
[http://dx.doi.org/10.1038/nrmicro3475] [PMID: 26075364]
[http://dx.doi.org/10.1139/o05-037] [PMID: 15864329]
[http://dx.doi.org/10.5732/cjc.012.10212] [PMID: 23114088]
[http://dx.doi.org/10.1186/1471-2180-14-140] [PMID: 24885331]
[http://dx.doi.org/10.1007/s13337-017-0383-7] [PMID: 29291214]
[http://dx.doi.org/10.1371/journal.pcbi.1005368] [PMID: 28129350]
[http://dx.doi.org/10.1038/s41598-017-04274-8] [PMID: 28638089]
[http://dx.doi.org/10.3389/fimmu.2019.01366] [PMID: 31293570]
[http://dx.doi.org/10.1111/j.0105-2896.2004.0124.x] [PMID: 15199962]
[http://dx.doi.org/10.1016/j.bbamem.2014.04.015] [PMID: 24780375]
[http://dx.doi.org/10.3390/molecules22112037] [PMID: 29165350]
[http://dx.doi.org/10.3390/molecules24101973] [PMID: 31121946]
[http://dx.doi.org/10.1155/2017/9861752]
[http://dx.doi.org/10.4155/fmc-2016-0188] [PMID: 28211294]
[http://dx.doi.org/10.1016/j.compbiolchem.2019.05.006] [PMID: 31146118]
[http://dx.doi.org/10.3390/ijms21072629] [PMID: 32290041]
[http://dx.doi.org/10.4155/fmc-2017-0300] [PMID: 30039980]
[http://dx.doi.org/10.3390/ijms20122950] [PMID: 31212918]
[PMID: 30649170]
[http://dx.doi.org/10.1093/bioinformatics/bty508] [PMID: 29931187]
[http://dx.doi.org/10.1093/bioinformatics/btz408] [PMID: 31099381]
[http://dx.doi.org/10.1093/bioinformatics/btz358] [PMID: 31077296]
[http://dx.doi.org/10.1093/bib/bbz177] [PMID: 31994694]
[http://dx.doi.org/10.1016/j.knosys.2018.10.007]
[http://dx.doi.org/10.18632/oncotarget.20365] [PMID: 29100375]
[http://dx.doi.org/10.3390/cells8111332] [PMID: 31661923]
[http://dx.doi.org/10.1093/bioinformatics/bty1047] [PMID: 30590410]
[http://dx.doi.org/10.1016/j.omtn.2019.04.019] [PMID: 31146255]
[http://dx.doi.org/10.1093/bioinformatics/btx222] [PMID: 28419290]
[http://dx.doi.org/10.3389/fphar.2018.00276] [PMID: 29636690]
[http://dx.doi.org/10.3389/fmicb.2018.00476] [PMID: 29616000]
[http://dx.doi.org/10.1021/acs.jproteome.8b00148] [PMID: 29893128]
[http://dx.doi.org/10.3389/fgene.2019.00129] [PMID: 30891059]
[http://dx.doi.org/10.1002/1873-3468.13536] [PMID: 31297788]
[http://dx.doi.org/10.1016/j.omtn.2019.05.028] [PMID: 31299595]
[http://dx.doi.org/10.1016/j.chemolab.2014.12.011]
[http://dx.doi.org/10.1007/s11030-009-9205-1] [PMID: 19908156]
[PMID: 28186907]
[PMID: 31157855]
[http://dx.doi.org/10.2174/1389200218666170320121932] [PMID: 28322159]
[PMID: 30190660]
[http://dx.doi.org/10.1093/bioinformatics/bty943] [PMID: 30428009]
[http://dx.doi.org/10.1093/bioinformatics/bty827] [PMID: 30247625]
[http://dx.doi.org/10.3390/cells8020095] [PMID: 30696115]
[http://dx.doi.org/10.3390/molecules23071667] [PMID: 29987232]
[http://dx.doi.org/10.1039/C7MB00491E] [PMID: 28990628]
[http://dx.doi.org/10.2174/0929866525666180905110619] [PMID: 30182830]
[http://dx.doi.org/10.2147/IJN.S140875] [PMID: 28894368]
[http://dx.doi.org/10.1371/journal.pone.0200283] [PMID: 30312302]
[http://dx.doi.org/10.1016/j.ijbiomac.2019.12.009] [PMID: 31805335]
[http://dx.doi.org/10.1038/s41598-019-44548-x] [PMID: 31164681]
[http://dx.doi.org/10.1039/C5MB00853K] [PMID: 26739209]
[http://dx.doi.org/10.1371/journal.pone.0129635] [PMID: 26080082]
[http://dx.doi.org/10.1007/s10822-020-00323-z] [PMID: 32557165]
[http://dx.doi.org/10.1016/j.ab.2020.113747] [PMID: 32333902]
[http://dx.doi.org/10.1016/j.ygeno.2020.03.019] [PMID: 32234434]
[http://dx.doi.org/10.1007/s11103-020-00988-y] [PMID: 32140819]
[http://dx.doi.org/10.1093/bioinformatics/btaa160] [PMID: 32145017]
[http://dx.doi.org/10.1016/j.csbj.2020.04.001] [PMID: 32322372]
[http://dx.doi.org/10.1093/nar/28.1.374] [PMID: 10592278]
[http://dx.doi.org/10.1023/A:1010933404324]
[http://dx.doi.org/10.1201/9781315139470]
[http://dx.doi.org/10.3390/ijms20081964] [PMID: 31013619]
[http://dx.doi.org/10.1016/j.csbj.2018.10.007] [PMID: 30425802]
[http://dx.doi.org/10.3389/fimmu.2018.01783] [PMID: 30108593]
[http://dx.doi.org/10.1002/med.21658] [PMID: 31922268]
[http://dx.doi.org/10.1016/j.omtn.2019.08.011] [PMID: 31542696]
[http://dx.doi.org/10.1007/BF00994018]
[http://dx.doi.org/10.1093/bioinformatics/btq003] [PMID: 20053844]
[http://dx.doi.org/10.1371/journal.pone.0136990] [PMID: 26335203]
[http://dx.doi.org/10.1186/1471-2105-8-263] [PMID: 17645800]
[http://dx.doi.org/10.1093/bioinformatics/bty451] [PMID: 29868903]
[http://dx.doi.org/10.1021/acs.jproteome.7b00019] [PMID: 28436664]
[http://dx.doi.org/10.1371/journal.pone.0120066] [PMID: 25781990]
[http://dx.doi.org/10.1155/2017/5761517]
[PMID: 30190664]
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID: 21168420]
[http://dx.doi.org/10.1093/bioinformatics/btv042] [PMID: 25619996]
[http://dx.doi.org/10.18632/oncotarget.7815] [PMID: 26942877]
[http://dx.doi.org/10.1186/1471-2105-13-S17-S3]
[http://dx.doi.org/10.1371/journal.pone.0072368] [PMID: 24019868]
[http://dx.doi.org/10.1371/journal.pone.0097158] [PMID: 24828431]
[http://dx.doi.org/10.3390/cells9020353] [PMID: 32028709]
[http://dx.doi.org/10.1186/s12859-016-1371-4] [PMID: 28155663]
[http://dx.doi.org/10.1186/1471-2105-15-S16-S4] [PMID: 25522279]
[http://dx.doi.org/10.1186/1471-2164-16-S12-S6] [PMID: 26677931]