In this video, you will learn to run a simple linear
regression analysis using the Linear Regression
task in SAS Studio.
Let’s say that we want to examine the
relationship between body fat percentage and
age.
That is, we want to know if older people tend to
have a larger body fat percentage than do
younger people.
One way to address this question would be to
use the Linear Regression task in SAS Studio.
We begin by clicking on Tasks in the panel on
the left side of the window. The bottom category
of tasks is labeled Statistics.
If you do not already see the contents of the
Statistics category, double-click on statistics to
expand the category.
Next, double-click on Linear Regression. The
task menu appears on the right. The default
name for the task is Linear Regression 1.
Now that we have selected a task, we do not
need to keep the Tasks pane open.
Click on the Maximize View button to close the
Tasks pane.
Before an analysis can be run, the Linear
Regression task requires a data set, a
dependent variable, and an explanatory variable.
Click on the icon next to the field for the data
set name. Select a data set for the analysis by
double-clicking on the name of the library
and then double-clicking on the name of the
data set. In this demonstration, we will use the
bodyfat2 data set from the libsas library.
To specify the dependent variable, click on the +
sign next to Dependent variable and then click
on PCTBODYFAT2.
To specify the explanatory variable, click on the
+ sign next to Explanatory variables and then
double-click on Age.
The code window on the right now contains SAS
syntax for running a simple linear regression
with the SAS Studio defaults.
Because SAS generates code for our analysis
and then updates the code as we specify
options,
the interactive code generator can serve as a
guide for those who wish to learn SAS
programming.
Before we run the analysis, let’s examine the
Methods, Options, and Output tabs.
19
The Methods tab gives us the option to set the
confidence level for constructing confidence
intervals,
allows us to remove the intercept from the
model, and allows us to request automated
model selection.
We’re not going to make any changes to this
tab.
The Statistics section of the Options tab allows
us to request additional statistics, such as
standardized regression coefficients,
an analysis of influence, a test for hetero-
scedasticity, and more.
The Plots section of the options tab allows us to
choose the plots to be output.
By default, the Linear Regression task outputs
diagnostic plots, scatter plots of the residuals
with each explanatory variable,
and a scatter plot of the observed values with
the predicted values.
The Output tab allows users to request that new
data sets be created from the analysis.
We can create data sets for the parameter
estimates or for observationwise statistics.
Though these options are useful, we are not
going to use them for this video.
Now that we have looked at the Methods,
Options, and Output tabs, let’s run our program
by clicking on the Run button.
The right side of the screen now displays the
results. In order to better see our results,
we can click on the Code/Results button in the
top left of the screen. This maximizes the
Code/Results window.
To return to the previous view, we would simply
click on Split. The window now shows the
syntax for our analysis, rather than our results.
To return to the results, click on the Results tab.
Alternatively, we could click on the Log tab to
see the log.
The output begins by telling us the number of
observations read from the data set, 252,
and the number of observations used in the
analysis, also 252. These numbers would differ if
the analysis variables contained missing data.
The next table in the output, labeled Analysis of
Variance, provides an overall test of our model.
The test is significant, indicating that our
explanatory variable, Age, is a significant
predictor of body fat percentage.
The third table in the output provides measures
of fit for the model. It contains: the root mean
square error,
the overall mean of the dependent variable, the
coefficient of variation,
the coefficient of determination, also called the
R-square value, and the adjusted R-square.
The next table in the output provides estimates
for the parameters in the model. The effect of
Age is both positive and statistically significant.
Next comes a scatter plot of the dependent
variable with the predicted values.
Following the scatter plot, we have a panel
containing a set of graphs commonly used for
model diagnostics.
Finally, we have a scatterplot for the explanatory
variable, Age, with the residuals.
This concludes our video on performing simple
linear regression using SAS Studio.