AI & Data Literacy by Bill Schmarzo
Author:Bill Schmarzo
Language: eng
Format: epub
Publisher: Packt
Published: 2023-11-15T00:00:00+00:00
Understanding probabilities and statistics
Making predictions about likely outcomes is a challenging task. As famously stated by Yogi Berra, âItâs tough to make predictions, especially about the future.â Accurate predictions rely on a nuanced understanding of probabilities, confidence levels, and confidence intervals.
Probability is a measure of the likelihood that a particular event will occur, typically expressed as a percentage (ranging from 0% to 100%). For example, examining Barry Bondsâ 2004 season with the San Francisco Giants, we can calculate the probability of him getting a hit as 36.2% (equivalent to 36.2 hits for every 100 at-bats).
Understanding probabilities is vital for assessing the likelihood of specific outcomes, equipping us with the necessary insights to make informed decisions. It is crucial to acknowledge that probabilities serve as estimates derived from available data and statistical analysis. While probabilities provide a framework for evaluating relative likelihoods, it is important to remember that they do not guarantee definitive outcomes. Therefore, to enhance the effectiveness of our predictions, it becomes imperative to harness the power of statistics.
Statistics is the practice or science of collecting and analyzing numerical data in large quantities, especially to infer proportions as a whole from those in a representative sample. By leveraging statistical techniques, we can analyze patterns, identify correlations, and uncover valuable insights that enable us to make more accurate and reliable predictions.
When using statistics to help us calculate probabilities and make predictions, we need to understand the statistical concepts of the mean (or average), variance, standard deviation, confidence intervals, and confidence levels. These are basic statistical concepts that everyone needs to understand in order to leverage statistics to make more informed decisions. Letâs define these basic concepts:
The mean or average is the sum of a collection of numbers divided by the count of numbers in the collection.
Variance measures the variability of the numbers or observations from the average or the mean of that same set of numbers or observations. Variance measures how dispersed the data is for the mean.
Standard deviation is simply the square root of the variance. A low standard deviation means data is clustered around the mean, and a high standard deviation indicates data is more spread out. A standard deviation near zero indicates that data points are close to the mean. In contrast, a high or low standard deviation indicates that data points are respectively above or below the mean.
The confidence interval is the range of values you expect your estimate to fall between for a certain percentage of the time if you rerun your experiment or re-sample the population similarly.
The confidence level is the percentage of time you expect to reproduce an estimate between the upper and lower bounds of the confidence interval.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Access | Data Mining |
Data Modeling & Design | Data Processing |
Data Warehousing | MySQL |
Oracle | Other Databases |
Relational Databases | SQL |
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8310)
Azure Data and AI Architect Handbook by Olivier Mertens & Breght Van Baelen(6839)
Building Statistical Models in Python by Huy Hoang Nguyen & Paul N Adams & Stuart J Miller(6816)
Serverless Machine Learning with Amazon Redshift ML by Debu Panda & Phil Bates & Bhanu Pittampally & Sumeet Joshi(6699)
Data Wrangling on AWS by Navnit Shukla | Sankar M | Sam Palani(6488)
Driving Data Quality with Data Contracts by Andrew Jones(6441)
Machine Learning Model Serving Patterns and Best Practices by Md Johirul Islam(6187)
Learning SQL by Alan Beaulieu(6007)
Weapons of Math Destruction by Cathy O'Neil(5801)
Big Data Analysis with Python by Ivan Marin(5409)
Data Engineering with dbt by Roberto Zagni(4418)
Solidity Programming Essentials by Ritesh Modi(4066)
Time Series Analysis with Python Cookbook by Tarek A. Atwan(3927)
Pandas Cookbook by Theodore Petrou(3630)
Blockchain Basics by Daniel Drescher(3308)
Hands-On Machine Learning for Algorithmic Trading by Stefan Jansen(2914)
Feature Store for Machine Learning by Jayanth Kumar M J(2822)
Learn T-SQL Querying by Pam Lahoud & Pedro Lopes(2804)
Mastering Python for Finance by Unknown(2748)
