Support Vector Machines and Kernels for Computational Biology by Various Authors
Author:Various Authors [Various Authors]
Language: eng
Format: epub
Published: 2015-05-20T16:00:00+00:00
can model up to d co-occurrences of â„“-mers (similarly proposed in [32] ).
Other sequence kernels
Because of the importance of sequence data and the many ways of modeling it, there are many alternatives to the spectrum and weighted degree kernels. Most closely related to the spectrum kernel are extensions allowing for gaps or mismatches [28] . The feature space of the spectrum kernel and these related kernels is the set of all â„“-mers of a given length. An alternative is to restrict attention to a predefined set of motifs [34] ,[35] .
Sequence similarity has been studied extensively in the bioinformatics community, and local alignment algorithms like BLAST and Smith-Waterman are good at revealing regions of similarity between proteins and DNA sequences. The statistics produced by these algorithms do not satisfy the mathematical condition required of a kernel function. But they can still be used as a basis for highly effective kernels. The simplest way is to represent a sequence in terms of its BLAST/Smith-Waterman scores against a database of sequences [36] . This is a general method for using a similarity measure as a kernel. An alternative approach taken was to modify the Smith-Waterman algorithm to consider the space of all local alignments, leading to the local alignment kernel [37] .
Probabilistic models, and Hidden Markov Models in particular, are in wide use for sequence analysis. The dependence of the log-likelihood of a sequence on the parameters of the model can be used to represent a variable-length sequence in a fixed dimensional vector space. The so-called Fisher-kernel uses the sensitivity of the log-likelihood of a sequence with respect to the model parameters as the feature space [38] (see also [39] ). The intuition is that if we were to update the model to increase the likelihood of the data, this is the direction a gradient-based method would take. Thus, we are characterizing a sequence by its effect on the model. Other kernels based on probabilistic models include the Covariance kernel [40] and Marginalized kernels [41] .
Summary and Further Reading
This tutorial introduced the concepts of large margin classification as implemented by SVMs, an idea that is both intuitive and also supported by theoretical results in statistical learning theory. The SVM algorithm allows the use of kernels, which are efficient ways of computing scalar products in nonlinear feature spaces. The “kernel trick†is also applicable to other types of data, e.g., sequence data, which we illustrated in the problem of predicting splice sites in C. elegans .
In the rest of this section, we outline issues that we have not covered in this tutorial and provide pointers for further reading. For a comprehensive discussion of SVMs and kernel methods, we refer the reader to recent books on the subject [2] ,[5] ,[7] .
Normalization
Large margin classifiers are known to be sensitive to the way features are scaled (see, for example [42] , in the context of SVMs). It can therefore be essential to normalize the data. This observation carries over to kernel-based classifiers that use nonlinear kernel functions.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8305)
Test-Driven Development with Java by Alan Mellor(6766)
Data Augmentation with Python by Duc Haba(6682)
Principles of Data Fabric by Sonia Mezzetta(6430)
Learn Blender Simulations the Right Way by Stephen Pearson(6329)
Microservices with Spring Boot 3 and Spring Cloud by Magnus Larsson(6200)
Hadoop in Practice by Alex Holmes(5964)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(5812)
RPA Solution Architect's Handbook by Sachin Sahgal(5599)
Big Data Analysis with Python by Ivan Marin(5383)
The Infinite Retina by Robert Scoble Irena Cronin(5291)
Life 3.0: Being Human in the Age of Artificial Intelligence by Tegmark Max(5153)
Pretrain Vision and Large Language Models in Python by Emily Webber(4349)
Infrastructure as Code for Beginners by Russ McKendrick(4112)
Functional Programming in JavaScript by Mantyla Dan(4041)
The Age of Surveillance Capitalism by Shoshana Zuboff(3961)
WordPress Plugin Development Cookbook by Yannick Lefebvre(3827)
Embracing Microservices Design by Ovais Mehboob Ahmed Khan Nabil Siddiqui and Timothy Oleson(3629)
Applied Machine Learning for Healthcare and Life Sciences Using AWS by Ujjwal Ratan(3602)
