Avoiding Data Pitfalls by Ben Jones
Author:Ben Jones
Language: eng
Format: epub
ISBN: 9781119278177
Publisher: Wiley
Published: 2019-11-19T00:00:00+00:00
FIGURE 5.14 Cumulative salary.
FIGURE 5.15 Cumulative heights of players.
These are all very different types of variation. But the pitfall – and it's a “humdinger” as my mom likes to say – is that we almost always think that “average” means “typical” for each one of them. It doesn't.
Pitfall 4B: Inferential Infernos
When we have data about all of the members of a population, such as the football league player data in the previous section, there's no need to make any inferences about the difference between groups within that population, because we're dealing with all of the data.
We don't have to infer which team has the tallest average player height, for example, because we can just compute average height for each team and then sort the teams in descending order. This is descriptive statistics, and we saw how even that activity can be tricky. (It was Pittsburgh, by the way.)
Many times, though, when we're working with data, it isn't feasible, practical, or cost effective to obtain data about every single one of the individual elements of a given population, so we have to collect data from samples, and make inferences about differences between groups. Here's where the trickiness increases by leaps and bounds.
There's a reason the census in the United States is decennial – meaning it only happens once every 10 years: it's very expensive and extremely difficult to attempt to count every single person in every single residential structure in the entire country, and such an undertaking is not without its sources of bias and error. The current budget request for the FY 2020 census is $6.3 billion.6 That's not a cheap data collection program at all. Worthwhile? Sure. But not cheap.
Since most organizations don't have the resources of the U.S. federal government or billions of dollars of funding to undertake such an exhaustive initiative, they make decisions based on data taken from subsets of the population. A lot. But they don't always do it right.
Making inferences based on data from samples of the population is a particular stretch on the road to data heaven that is absolutely full of pitfalls, one after the other. It very well might be the most treacherous zone of all.
Here are some common examples in everyday life and business that involve using data taken from a subset of the population:
Customer satisfaction: When companies seek to survey their customers, they know that many won't respond to their email survey, so it can be very difficult to get 100% feedback from an entire group of people who have purchased your product or service.
Quality control: When engineers want to test whether products in manufacturing meet specifications, the tests can often be costly, and sometimes even destructive in nature (like determining tensile strength), so it wouldn't make financial or practical sense to test 100% of the parts.
Clinical trials: Researching the efficacy of an experimental drug means researchers need to see whether a group of study participants who used the drug fared any better than another group in the study that took a placebo.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Bookkeeping | Business Mathematics |
Business Writing | Communications |
Decision Making | Negotiating |
Project Management | Running Meetings & Presentations |
Secretarial Aids & Training | Time Management |
Training |
Nudge - Improving Decisions about Health, Wealth, and Happiness by Thaler Sunstein(7237)
Deep Work by Cal Newport(6560)
Principles: Life and Work by Ray Dalio(5953)
The Doodle Revolution by Sunni Brown(4498)
Factfulness: Ten Reasons We're Wrong About the World – and Why Things Are Better Than You Think by Hans Rosling(4485)
Eat That Frog! by Brian Tracy(4147)
Thinking in Bets by Annie Duke(3995)
Hyperfocus by Chris Bailey(3899)
Visual Intelligence by Amy E. Herman(3620)
Writing Your Dissertation in Fifteen Minutes a Day by Joan Bolker(3571)
How to Win Friends and Influence People in the Digital Age by Dale Carnegie & Associates(3361)
Ogilvy on Advertising by David Ogilvy(3324)
Hidden Persuasion: 33 psychological influence techniques in advertising by Marc Andrews & Matthijs van Leeuwen & Rick van Baaren(3290)
How to win friends and influence people by Dale Carnegie(3266)
The Pixar Touch by David A. Price(3204)
Schaum's Quick Guide to Writing Great Short Stories by Margaret Lucke(3182)
Deep Work: Rules for Focused Success in a Distracted World by Cal Newport(2977)
Work Clean by Dan Charnas(2887)
The Slow Fix: Solve Problems, Work Smarter, and Live Better In a World Addicted to Speed by Carl Honore(2837)
