# 9 Evaluating Model Utility

Linear Regression: A graphical representation of a best fit line for simple linear regression

The results of multiple regression should be viewed with caution.

Learning Objectives

After completion of this session, you will be able to:

• Evaluate the potential drawbacks of multiple regression.

Key Takeaways

Key Points

• You should examine the linear regression of the dependent variable on each independent variable, one at a time, examine the linear regressions between each pair of independent variables, and consider what you know about the subject matter.
• You should probably treat multiple regression as a way of suggesting patterns in your data, rather than rigorous hypothesis testing.
• If independent variables AAand BB are both correlated with YY, and AA and BB are highly correlated with each other, only one may contribute significantly to the model, but it would be incorrect to blindly conclude that the variable that was dropped from the model has no significance.

Key Terms

• independent variable: in an equation, any variable whose value is not dependent on any other in the equation
• dependent variable: in an equation, the variable whose value depends on one or more variables in the equation
• multiple regression: regression model used to find an equation that best predicts the YYvariable as a linear function of multiple XX variables

Multiple regression is beneficial in some respects, since it can show the relationships between more than just two variables; however, it should not always be taken at face value.

It is easy to throw a big data set at a multiple regression and get an impressive-looking output. But many people are skeptical of the usefulness of multiple regression, especially for variable selection, and you should view the results with caution. You should examine the linear regression of the dependent variable on each independent variable, one at a time, examine the linear regressions between each pair of independent variables, and consider what you know about the subject matter. You should probably treat multiple regression as a way of suggesting patterns in your data, rather than rigorous hypothesis testing.

If independent variables AA and BB are both correlated with YY, and AA and BB are highly correlated with each other, only one may contribute significantly to the model, but it would be incorrect to blindly conclude that the variable that was dropped from the model has no biological importance. For example, let’s say you did a multiple regression on vertical leap in children five to twelve years old, with height, weight, age, and score on a reading test as independent variables. All four independent variables are highly correlated in children, since older children are taller, heavier, and more literate, so it’s possible that once you’ve added weight and age to the model, there is so little variation left that the effect of height is not significant. It would be biologically silly to conclude that height had no influence on vertical leap. Because reading ability is correlated with age, it’s possible that it would contribute significantly to the model; this might suggest some interesting followup experiments on children all of the same age, but it would be unwise to conclude that there was a real effect of reading ability and vertical leap based solely on the multiple regression.

Linear Regression: Random data points and their linear regression.