# 21 Reports

Predicted Classification Report

 Predicted Classification Section Row Actual Predicted Pcnt1 Pcnt2 Pcnt3 1 Setosa Setosa 100.0 0.0 0.0 2 Virginica Virginica 0.0 0.0 100.0 3 Versicolo Versicolo 0.0 99.6 0.4 4 Virginica Virginica 0.0 0.0 100.0 5 Virginica Versicolo 0.0 72.9 27.1 6 Setosa Setosa 100.0 0.0 0.0 7 Virginica Virginica 0.0 0.0 100.0 8 Versicolo Versicolo 0.0 96.0 4.0

This report shows the actual group, the predicted group, and the percentage probabilities of each row. The definitions are given above in the Misclassified Rows Report.

Canonical Variate Analysis Report This report provides a canonical correlation analysis of the discriminant problem. Recall that canonical correlation analysis is used when you want to study the correlation between two sets of variables. In this case, the two sets of variables are defined in the following way. The independent variables comprise the first set. The group variable defines another set, which is generated by creating an indicator variable for each group except the last one.

Inv(W)B Eigenvalue

The eigenvalues of the matrix W-1B. These values indicate how much of the total variation explained is accounted for by the various discriminant functions. Hence, the first discriminant function corresponds to the first eigenvalue, and so on. Note that the number of eigenvalues is the minimum of the number of variables and K-1, where K is the number of groups.

Ind’l Prcnt

The percent that this eigenvalue is of the total.

Total Prcnt

The cumulative percent of this and all previous eigenvalues.

Canon Corr

The canonical correlation coefficient.

Canon Corr2

The square of the canonical correlation. This is similar to R-Squared in multiple regression.

F-Value

The value of the approximate F-ratio for testing the significance of the Wilks’ lambda corresponding to this row and those below it. Hence, in this example, the first F-value tests the significance of both the first and second canonical correlations, while the second F-value tests the significance of the second correlation only.

Num DF

The numerator degrees of freedom for this F-test.

Denom DF

The denominator degrees of freedom for this F-test.

Prob Level

The significance level of the F-test. This is the area under the F-distribution to the right of the F-value. Usually, a value less than 0.05 is considered significant.

Wilks’ Lambda

The value of Wilks’ lambda for this row. This Wilks’ lambda is used to test the significance of the discriminant function corresponding to this row and those below it. Recall that Wilks’ lambda is a multivariate generalization of R². The above F-value is an approximate test of this Wilks’ lambda.

Canonical Coefficients Report

 Canonical Coefficients Section Variable Canonical Variate Variate1 Variate2 Constant -2.105106 6.661473 Sepal Length -0.082938 -0.002410 Sepal Width -0.153447 -0.216452 Petal Length 0.220121 0.093192 Petal Width 0.281046 -0.283919

This report gives the coefficients used to create the canonical scores. The canonical scores are weighted averages of the observations, and these coefficients are the weights (with the constant term added).

Canonical Variates at Group Means Report This report gives the results of applying the canonical coefficients to the means of each of the groups.

Std. Canonical Coefficients Report Variable-Variate Correlations Report This report gives the loadings (correlations) of the variables on the canonical variates. That is, each entry is the correlation between the canonical variate and the independent variable. This report can help you interpret a particular canonical variate.

Linear Discriminant Scores Report This report gives the individual values of the linear discriminant scores. Note that this information may be stored on the database using the Data Storage options.

Regression Scores Report This report gives the individual values of the predicted scores based on the regression coefficients. Even though these values are predicting indicator variables, it is possible for a value to be less than zero or greater than one. Note that this information may be stored on the database using the Data Storage options.

Canonical Scores Report This report gives the scores of the canonical variates for each row. Note that this information may be stored on the database using the Data Storage options.

Scores Plot(s)

You may select plots of the linear discriminant scores, regression scores, or canonical scores to aid in your interpretation. These plots are usually used to give a visual impression of how well the discriminant functions are classifying the data. (Several charts are displayed on the output. Only one of these is displayed here.) This chart plots the values of the first and second linear discriminant scores. By looking at this plot you can see what the classification rule would be. Also, it is obvious from this plot that the first two linear-discriminant functions are necessary in discriminating among the varieties of iris since the groups can be separated along diagonal lines.

Example 2 – Automatic Variable Selection (Brief Report)

The tutorial we have just concluded was based on all four of the independent variables. A common task in discriminant analysis is variable selection. Often you have a large pool of possible independent variables from which you want to select a smaller set (up to about eight variables) which will do almost as well at discriminating as the complete set. NCSS provides an automatic procedure for doing this, which will be described next.

The automatic variable selection is run by changing the Variable Selection option to Stepwise. The program will conduct a stepwise variable selection. It will first find the best discriminator and then the second best. After it has found two, it checks whether the discrimination would be almost as good if one were removed. This stepping process of adding the best remaining variable and then checking if one of the active variables could be removed continues until no new variable can be found whose F-value has a smaller than the Probability Enter value.

An alternative procedure is to use the Multivariate Variable Selection procedure described elsewhere in this manual. If you have more than two groups, you must create a set of dummy (indicator) variables, one for each group. You ignore the last dummy variable, so if there are K groups, you analyze K-1 dummy variables. The Multivariate Variable Selection program will always find a subset of your independent variables that is at least as good (and usually better) as the stepwise procedure described in this section. Once a subset of independent variables has been found, they can then be analyzed using the Discriminant Analysis program described here.

Once the variable selection has been made, the program provides the reports that were described in the previous tutorial. Note that two report formats may be called for during the variable selection phase: brief and verbose. We will now provide an example of each type of report.

You may follow along here by making the appropriate entries or load the completed template Example 2 by clicking on Open Example Template from the File menu of the Discriminant Analysis window.

1 Open the Fisher dataset.

• From the File menu of the NCSS Data window, select Open Example Data.
• Click on the file Fisher.NCSS.
• Click Open.

2 Open the Discriminant Analysis window.

• Using the Analysis menu or the Procedure Navigator, find and select the Discriminant Analysis procedure.

3  Specify the variables.

• On the Discriminant Analysis window, select the Variables tab.
• Double-click in the Y: Group Variable box. This will bring up the variable selection window.
• Select Iris from the list of variables and then click Ok. “Iris” will appear in the Y: Group Variable box.
• Double-click in the X’s: Independent Variables text box. This will bring up the variable selection window.
• Select Sepal Length through PetalWidth from the list of variables and then click Ok. “SepalLength- PetalWidth” will appear in the X’s: Independent Variables.
• Enter Stepwise in the Variable Selection box.

4  Specify the reports.

• Select the Reports tab.
• Uncheck all reports and plots. We will only view the Variable Selection Report.
• Enter Labels in the Variable Names box.
• Enter Value Labels in the Value Labels box.
• Enter Brief in the Output box.

5 Run the proedure.

• From the Run menu, select Run Procedure. Alternatively, just click the green Run button.

Variable-Selection Summary Report This report shows what action was taken at each step.

Iteration

This gives the number of this step.

Action This Step

This tells what action (if any) was taken during this step. “Entered” means that the variable was entered into the set of active variables. “Removed” means that the variable was removed from the set of active variables.

Pct Chg In Lambda

This is the percentage decrease in lambda that resulted from this step. Note that Wilks’ lambda is analogous to 1 – R-Squared in multiple regression. Hence, we want to decrease Wilks’ lambda to improve our model. For example, going from iteration 2 to iteration 3 results in lambda decreasing from 0 .036884 to 0.024976. This is a 32.29% decrease in lambda.

F-Value

This is the F-ratio for testing the significance of this variable. If the variable was “Entered,” this tests the hypothesis that the variable should be added. If the variable was “Removed,” this tests whether the variable should be removed.

Prob Level

The significance level of the above F-Value.

Wilks’ Lambda

The multivariate extension of R-Squared. Wilks’ lambda reduces to 1-(R-Squared) in the two-group case. It is interpreted just backwards from R-Squared. It varies from one to zero. Values near one imply low predictability, while values close to zero imply high predictability. Note that this Wilks’ lambda value corresponds to the currently active variables.

Example 3 – Automatic Variable Selection (Verbose Report)

We will now rerun this example with the “verbose” option. We assume that the Fisher dataset is available and you are in the Discriminant Analysis procedure.

You may follow along here by making the appropriate entries or load the completed template Example 3 by clicking on Open Example Template from the File menu of the Discriminant Analysis window.

1 Open the Fisher

• From the File menu of the NCSS Data window, select Open Example Data.
• Click on the file Fisher.NCSS.
• Click Open.

2 Open the Discriminant Analysis

• Using the Analysis menu or the Procedure Navigator, find and select the Discriminant Analysis procedure.

3 Specify the variables

• On the Discriminant Analysis window, select the Variables tab.
• Double-click in the Y: Group Variable box. This will bring up the variable selection window.
• Select Iris from the list of variables and then click Ok. “Iris” will appear in the Y: Group Variable box.
• Double-click in the X’s: Independent Variables text box. This will bring up the variable selection window.
• Select Sepal Length through PetalWidth from the list of variables and then click Ok. “SepalLength- PetalWidth” will appear in the X’s: Independent Variables.
• Enter Stepwise in the Variable Selection box.

4 Specify the reports

• Select the Reports tab.
• Uncheck all reports and plots. We will only view the Variable Selection Report.
• Enter Verbose in the Output box.
• Enter Labels in the Variable Names box.
• Enter Value Labels in the Value Labels box.

5 Run the procedure

• From the Run menu, select Run Procedure. Alternatively, just click the green Run button.

Variable-Selection Detail Report  This report shows the details of each step.

Step

This gives the number of this step (iteration).

Status

This tells whether the variable is “in” or “out” of the set of active variables.

Pct Chg In Lambda

This is the percentage decrease in lambda that would result if the status of this variable were reversed.

F-Value

This is the F-ratio for testing the significance of changing the status of this variable.

Prob Level

The significance level of the above F-Value.

R-Squared Other X’s

This is the R-Squared that would result if this variable were regressed on the other independent variables that are active (status = “In”). This provides a check for multicollinearity in the active independent variables.

Overall Wilks’ Lambda

This is the value of Wilks’ lambda for all active independent variables. A value near zero indicates an accurate model; a value near one indicates a poor model. 