Data Analytics for Absolute Beginners: A Deconstructed Guide to Data Literacy: (Introduction to Data, Data Visualization, Business Intelligence & Machine Learning) by Oliver Theobald

Data Analytics for Absolute Beginners: A Deconstructed Guide to Data Literacy: (Introduction to Data, Data Visualization, Business Intelligence & Machine Learning) by Oliver Theobald

Author:Oliver Theobald [Theobald, Oliver]
Language: eng
Format: epub
Published: 2017-01-06T00:00:00+00:00


REGRESSION ANALYSIS

Regression analysis is a popular statistical technique used to model the relationship between one or more independent variables and the dependent variable. Businesses, for instance, often utilize regression to predict sales (output) based on a range of input variables including weather temperature, social media mentions, historical sales, GDP growth, and inbound tourists.

The objective of regression analysis is to find a line or curve that best describes patterns in the data. Although a single line or curve actually oversimplifies the data, it provides a useful reference point for making general predictions about future data. The quality of predictions derived from a regression line/curve is underwritten by co-relation, and specifically, a coefficient of correlation, which is equal to the square root of the line’s variance. A coefficient of correlation is measured between -1 and 1, with a correlation of 1 describing a perfect positive relationship and a correlation of -1 indicating a perfect negative relationship. A coefficient of 0, meanwhile, means no relationship between the variables.

A negative correlation means that an increase in the independent variable leads to a subsequent decrease in the dependent variable. For example, a house’s value (dependent variable) tends to depreciate as the distance to the city (independent variable) increases. Conversely, a positive correlation captures a positive relationship between variables. House value (dependent variable), for instance, generally appreciates in sync with house size (independent variable). In the case of plots C and D, no linear relationship exists between the two variables, relegating regression analysis a poor choice for interpreting the data.

Another potential problem is collinearity. This occurs when there’s a strong linear correlation between two independent variables, which limits the regression model’s capacity to predict the dependent variable. An example of collinearity would be using liters of fuel consumed and liters of fuel remaining in the tank as independent variables to predict car mileage. The two independent variables, in this case, are negatively correlated and virtually cancel each out when included in the same regression model. Rather, it would be better to include one variable and sideline the other variable. Height and weight are another popular example of two variables that are often highly correlated and can lead to problems with collinearity in the model.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.