Learning Kernel Classifiers by Ralf Herbrich
Author:Ralf Herbrich [Herbrich, Ralf]
Language: eng
Format: epub
Published: 2011-02-19T05:00:00+00:00
160
Chapter 4
inf h∈ R [ h] (see also Lee et al. (1998) for tighter results in the special case of À
convex hypothesis spaces). The VC and PAC analysis revealed that, for the case of learning, the growth function of a hypothesis space is an appropriate a-priori measure of its complexity. As the growth function is very difficult to compute, it is often characterized by a one-integer summary known as VC dimension (see
Theorem 4.10 and Sontag (1998) for an excellent survey of the VC dimension).
The first proof of this theorem is due to Vapnik and Chervonenkis (1971) and was discovered independently in Sauer (1972) and Shelah (1972); the former credits Erdös with posing it as a conjecture. In order to make the VC dimension a variable of the learning algorithm itself two conceptually different approaches were presented: By defining an a-priori structuring of the hypothesis space—sometimes also referred to as a decomposition of the hypothesis space
(Shawe-Taylor et al.
À
1998)—it is possible to provide guarantees for the generalization error with high confidence by sharing the confidence among the different hypothesis spaces. This principle, known as structural risk minimization, is due to Vapnik and Chervonenkis (1974). A more promising approach is to define an effective complexity via a luckiness function which encodes some prior hope about the learning problem
given by the unknown PZ. This framework, also termed the luckiness framework is due to Shawe-Taylor et al. (1998). For more details on the related problem of conditional confidence intervals the interested reader is referred to Brownie and Kiefer (1977), Casella (1988), Berger (1985) and Kiefer (1977). All examples given in Section 4.3 are taken from Shawe-Taylor et al. (1998). The luckiness framework is most advantageous if we refine what is required from a learning algorithm: A learning algorithm
is given a training sample z ∈
m and a confidence δ ∈ (0, 1],
and is then required to return a hypothesis
( z) ∈
together with an accuracy
À
ε such that in at least 1 − δ of the learning trials the expected risk of
( z) is
less than or equal to the given ε. Y. Freund called such learning algorithms self bounding learning algorithms (Freund 1998). Although, without making explicit assumptions on PZ, all learning algorithms might be equally good, a self bounding learning algorithm is able to tell the practitioner when its implicit assumptions are met. Obviously, a self bounding learning algorithm can only be constructed having a theoretically justified generalization error bound available.
In the last section of this chapter we presented a PAC analysis for the particular hypothesis space of linear classifiers making extensive use of the margin as a data dependent complexity measure. In Theorem 4.25 we showed that the margin, that is, the minimum real-valued output of a linear classifier before thresholding, allows us to replace the coarse application of the union bound over the worst case diversity of the binary-valued function class by a union bound over the number of 161
Mathematical Models of Learning
equivalence classes witnessed by the observed margin. The proof of this result can also be found in Shawe-Taylor and Cristianini (1998, Theorem 6.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8303)
Test-Driven Development with Java by Alan Mellor(6737)
Data Augmentation with Python by Duc Haba(6653)
Principles of Data Fabric by Sonia Mezzetta(6404)
Learn Blender Simulations the Right Way by Stephen Pearson(6301)
Microservices with Spring Boot 3 and Spring Cloud by Magnus Larsson(6174)
Hadoop in Practice by Alex Holmes(5960)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(5809)
RPA Solution Architect's Handbook by Sachin Sahgal(5569)
Big Data Analysis with Python by Ivan Marin(5371)
The Infinite Retina by Robert Scoble Irena Cronin(5259)
Life 3.0: Being Human in the Age of Artificial Intelligence by Tegmark Max(5152)
Pretrain Vision and Large Language Models in Python by Emily Webber(4335)
Infrastructure as Code for Beginners by Russ McKendrick(4098)
Functional Programming in JavaScript by Mantyla Dan(4039)
The Age of Surveillance Capitalism by Shoshana Zuboff(3959)
WordPress Plugin Development Cookbook by Yannick Lefebvre(3810)
Embracing Microservices Design by Ovais Mehboob Ahmed Khan Nabil Siddiqui and Timothy Oleson(3613)
Applied Machine Learning for Healthcare and Life Sciences Using AWS by Ujjwal Ratan(3587)
