# 7 Multiple Regression Models

Multiple regression is used to find an equation that best predicts the YY variable as a linear function of the multiple XX variables.

Learning Objectives

After completion of this session, you will be able to:

• Describe how multiple regression can be used to predict an unknown YY value based on a corresponding set of XX values or understand functional relationships between the dependent and independent variables.

Key Takeaways

Key Points

• One use of multiple regression is prediction or estimation of an unknown YYvalue corresponding to a set of XX values.
• A second use of multiple regression is to try to understand the functional relationships between the dependent and independent variables, to try to see what might be causing the variation in the dependent variable.
• The main null hypothesis of a multiple regression is that there is no relationship between the XXvariables and the YY variables–i.e. that the fit of the observed YY values to those predicted by the multiple regression equation is no better than what you would expect by chance.

When To Use

You use multiple regression when you have three or more measurement variables. One of the measurement variables is the dependent (YY) variable. The rest of the variables are the independent (XX) variables. The purpose of a multiple regression is to find an equation that best predicts the YY variable as a linear function of the XX variables.

Multiple Regression For Prediction

One use of multiple regression is prediction or estimation of an unknown YY value corresponding to a set of XX values. For example, let’s say you’re interested in finding a suitable habitat to reintroduce the rare beach tiger beetle, Cicindela dorsalis dorsalis, which lives on sandy beaches on the Atlantic coast of North America. You’ve gone to a number of beaches that already have the beetles and measured the density of tiger beetles (the dependent variable) and several biotic and abiotic factors, such as wave exposure, sand particle size, beach steepness, density of amphipods and other prey organisms, etc. Multiple regression would give you an equation that would relate the tiger beetle density to a function of all the other variables. Then, if you went to a beach that didn’t have tiger beetles and measured all the independent variables (wave exposure, sand particle size, etc.), you could use the multiple regression equation to predict the density of tiger beetles that could live there if you introduced them. Atlantic Beach Tiger Beetle: This is the Atlantic beach tiger beetle (Cicindela dorsalis dorsalis), which is the subject of the multiple regression study in this atom.

Multiple Regression For Understanding Causes

A second use of multiple regression is to try to understand the functional relationships between the dependent and independent variables, to try to see what might be causing the variation in the dependent variable. For example, if you did a regression of tiger beetle density on sand particle size by itself, you would probably see a significant relationship. If you did a regression of tiger beetle density on wave exposure by itself, you would probably see a significant relationship. However, sand particle size and wave exposure are correlated; beaches with bigger waves tend to have bigger sand particles. Maybe sand particle size is really important, and the correlation between it and wave exposure is the only reason for a significant regression between wave exposure and beetle density. Multiple regression is a statistical way to try to control for this; it can answer questions like, “If sand particle size (and every other measured variable) were the same, would the regression of beetle density on wave exposure be significant? ”

The main null hypothesis of a multiple regression is that there is no relationship between the XX variables and the YY variables– in other words, that the fit of the observed YY values to those predicted by the multiple regression equation is no better than what you would expect by chance. As you are doing a multiple regression, there is also a null hypothesis for each XX variable, meaning that adding that XX variable to the multiple regression does not improve the fit of the multiple regression equation any more than expected by chance. 