Big Data Analytics: A Practical Guide for Managers by Kim H. Pries

Big Data Analytics: A Practical Guide for Managers by Kim H. Pries

Author:Kim H. Pries
Language: rus
Format: azw3, pdf
Publisher: Auerbach Publications
Published: 2015-02-11T22:00:00+00:00


P B

( )

260 • Big Data Analytics

It looks fearsome, but it is not. Bayes’s theorem lets us calculate a pos-

terior probability using prior information—which we are more likely to

have. Put another way, using imperfect information that we do have, we

can focus on a more accurate probability that we cannot discern directly.

In the way that Bayes’s theorem is usually discussed, repeated applica-

tions of Bayes’s theorem should drive the posterior distribution to more

closely resemble the “real” distribution. More fundamentally, Bayes’s the-

orem does with probabilities what golf clubs do with golf balls. It moves

the ball closer to the hole (well, one of us is a golfer who is not always so good at moving the ball in the right direction). With each stroke, the golf ball should be in a better location than it was previously. With Bayes’s

theorem, each application to the same problem to be solved should bring

the calculated probability more in line with the underlying reality.

Bayes’s theorem is often unfairly treated as merely a way of creating a

quantitative façade over subjective speculation. A false dichotomy exists

between frequentist statistics and Bayesian analytics. Frequentist statis-

tics are effectively those we use most commonly with our firms’ data sets;

for example, we can roll up the data set and see 17% of our customers

spent on average more than $200 per month in our store last year and

our Cleveland office has the best employee retention rate. The hypothesis

testing we have discussed is a frequentist technique. Bayesian analysis is

something different, and as we will see using one of John Ioannidis’ exam-

ples, it can shed much light on how strong is the conclusion we can draw

from our hypothesis test.

Dr. Ioannidis posits a thought experiment using research involving gene

polymorphisms related to schizophrenia. He argues that out of 100,000

polymorphisms, it is realistic to expect we can tie 10 of those to risk of

this disease. That means that there is a 10/100,000, or 1/10,000, chance

that any one of these polymorphisms indicates heightened susceptibility.

Our study has 60% specificity (also known as detection rate or statistical

power), or in other words it detects 60% of cases. This also means that

there is a 40% (1 − 60% or 100% − 60%) Type II error rate where the test

fails to reject the null hypothesis when it should be rejected. Our significance level is .05. As we have discussed, this means that approximately 5%

of the time we can expect the test to incorrectly result in a rejection of the null hypothesis.

Let us plug in some numbers to calculate the probability that our rejected

null hypothesis was correctly rejected. First, we calculate the numerator

or the part of the equation above the line. This value is the probability that Statistics • 261

any given polymorphism is indicative of a heightened schizophrenia risk

multiplied by the statistical power, or detection rate. When we calculate

we get 0.01% multiplied by 60%, or 0.006%.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.