Support Vector Machines and Kernels for Computational Biology by Various Authors

Support Vector Machines and Kernels for Computational Biology by Various Authors

Author:Various Authors [Various Authors]
Language: eng
Format: epub
Published: 2015-05-20T16:00:00+00:00


can model up to d co-occurrences of â„“-mers (similarly proposed in [32] ).

Other sequence kernels

Because of the importance of sequence data and the many ways of modeling it, there are many alternatives to the spectrum and weighted degree kernels. Most closely related to the spectrum kernel are extensions allowing for gaps or mismatches [28] . The feature space of the spectrum kernel and these related kernels is the set of all â„“-mers of a given length. An alternative is to restrict attention to a predefined set of motifs [34] ,[35] .

Sequence similarity has been studied extensively in the bioinformatics community, and local alignment algorithms like BLAST and Smith-Waterman are good at revealing regions of similarity between proteins and DNA sequences. The statistics produced by these algorithms do not satisfy the mathematical condition required of a kernel function. But they can still be used as a basis for highly effective kernels. The simplest way is to represent a sequence in terms of its BLAST/Smith-Waterman scores against a database of sequences [36] . This is a general method for using a similarity measure as a kernel. An alternative approach taken was to modify the Smith-Waterman algorithm to consider the space of all local alignments, leading to the local alignment kernel [37] .

Probabilistic models, and Hidden Markov Models in particular, are in wide use for sequence analysis. The dependence of the log-likelihood of a sequence on the parameters of the model can be used to represent a variable-length sequence in a fixed dimensional vector space. The so-called Fisher-kernel uses the sensitivity of the log-likelihood of a sequence with respect to the model parameters as the feature space [38] (see also [39] ). The intuition is that if we were to update the model to increase the likelihood of the data, this is the direction a gradient-based method would take. Thus, we are characterizing a sequence by its effect on the model. Other kernels based on probabilistic models include the Covariance kernel [40] and Marginalized kernels [41] .

Summary and Further Reading

This tutorial introduced the concepts of large margin classification as implemented by SVMs, an idea that is both intuitive and also supported by theoretical results in statistical learning theory. The SVM algorithm allows the use of kernels, which are efficient ways of computing scalar products in nonlinear feature spaces. The “kernel trick†is also applicable to other types of data, e.g., sequence data, which we illustrated in the problem of predicting splice sites in C. elegans .

In the rest of this section, we outline issues that we have not covered in this tutorial and provide pointers for further reading. For a comprehensive discussion of SVMs and kernel methods, we refer the reader to recent books on the subject [2] ,[5] ,[7] .

Normalization

Large margin classifiers are known to be sensitive to the way features are scaled (see, for example [42] , in the context of SVMs). It can therefore be essential to normalize the data. This observation carries over to kernel-based classifiers that use nonlinear kernel functions.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.