Extrapolation, Outliers vs Influential Observations, & Lurking Variables
Extrapolation is a tool for estimating values that go beyond the cluster of given data. Because these predictions are way outside the range of data, extrapolation is risky.
Outliers vs. Influential Observations
An outlier can either be influential or non-influential. If the outlier is an influential observation, then it has a big impact on the correlation coefficient, r, and on the least squares regression line. When there is a lot of data, the outlier tends NOT to be influential.
Notice how the regression line is affected much more by the outlier when there are only a few data points. When there are thousands of data points, the regression line hardly changes despite the addition of the same outlier.
A lurking variable is a variable that is not among the explanatory or response variables, yet may influence the relationship among those variables. This is important because it reminds us that association does NOT imply causation. Just because there is a relationship between two variables does not mean that one is causing the other; lurking variables could be the real factor at play.
For example, let's say you are studying the relationship between weight loss and broccoli consumption. There is a positive linear relationship between the two. However, this does not mean that eating broccoli causes one to lose weight; a much more likely explanation would be that a lurking variable exists. Those who eat broccoli are more likely to engage in other healthy lifestyle options, such as avoiding fast food and working out. So while there is a positive association between weight loss and broccoli consumption, one is not causing the other.