Algorithms to Live By: The Computer Science of Human Decisions by Christian Brian & Griffiths Tom

Algorithms to Live By: The Computer Science of Human Decisions by Christian Brian & Griffiths Tom

Author:Christian, Brian & Griffiths, Tom [Christian, Brian & Griffiths, Tom]
Language: eng
Format: epub
Tags: Science, Psychology, Self Help, Business, Philosophy
ISBN: 9780007547982
Goodreads: 38593249
Publisher: William Collins
Published: 2016-04-19T07:00:00+00:00


The Three-Part Tradeoff

At once it struck me what quality went to form a Man of Achievement, especially in Literature, and which Shakespeare possessed so enormously—I mean Negative Capability, that is, when a man is capable of being in uncertainties, mysteries, doubts, without any irritable reaching after fact and reason.

—JOHN KEATS

There is no such thing as absolute certainty, but there is assurance sufficient for the purposes of human life.

—JOHN STUART MILL

Computer science is often a matter of negotiating tradeoffs. In our discussion of sorting in chapter 3, for instance, we noted the tradeoff between time spent up front on sorting versus the time spent later on searching. And in the discussion of caching in chapter 4, we explored the tradeoff of taking up extra space—caches for caches for caches—to save time.

Time and space are at the root of the most familiar tradeoffs in computer science, but recent work on randomized algorithms shows that there’s also another variable to consider: certainty. As Harvard’s Michael Mitzenmacher puts it, “What we’re going to do is come up with an answer which saves you in time and space and trades off this third dimension: error probability.” Asked for his favorite example of this tradeoff into uncertainty, he doesn’t hesitate. “A colleague just said that there should be a drinking game that every time this term appears on one of my slides, you have to take a drink. Have you ever heard of Bloom filters?”

To understand the idea behind a Bloom filter, Mitzenmacher says, consider a search engine like Google, trying to crawl the entire web and index every possible URL. The web is comprised of well over a trillion distinct URLs, and the average URL weighs in at about seventy-seven characters long. When the search engine looks at some URL, how can it check whether that page has already been processed? Just storing a list of all the URLs that have been visited would take a huge amount of space, and repeatedly searching that list (even if it were fully sorted) could prove a nightmare. In fact, it could well be that the cure is worse than the disease: in other words, checking every time to make sure that we’re not reindexing a page might be more time-consuming than just indexing the occasional page twice.

But what if we only needed to be mostly sure this URL was new to us? That’s where the Bloom filter comes in. Named for its inventor, Burton H. Bloom, a Bloom filter works much like the Rabin-Miller primality test: the URL is entered into a set of equations that esssentially check for “witnesses” to its novelty. (Rather than proclaim “n is not prime,” these equations say “I have not seen n before.”) If you’re willing to tolerate an error rate of just 1% or 2%, storing your findings in a probabilistic data structure like a Bloom filter will save you significant amounts of both time and space. And the usefulness of such filters is not confined to search engines: Bloom



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.