WEBVTT
1
00:00:02.900 --> 00:00:12.000
In this video, you will learn to run a simple linear
regression analysis using the Linear Regression
task in SAS Studio.
2
00:00:12.000 --> 00:00:19.533
Let’s say that we want to examine the
relationship between body fat percentage and
age.
3
00:00:19.533 --> 00:00:26.799
That is, we want to know if older people tend to
have a larger body fat percentage than do
younger people.
4
00:00:26.800 --> 00:00:33.400
One way to address this question would be to
use the Linear Regression task in SAS Studio.
5
00:00:33.400 --> 00:00:42.733
We begin by clicking on Tasks in the panel on
the left side of the window. The bottom category
of tasks is labeled Statistics.
6
00:00:42.733 --> 00:00:51.399
If you do not already see the contents of the
Statistics category, double-click on statistics to
expand the category.
7
00:00:51.400 --> 00:01:04.100
Next, double-click on Linear Regression. The
task menu appears on the right. The default
name for the task is Linear Regression 1.
8
00:01:04.100 --> 00:01:09.100
Now that we have selected a task, we do not
need to keep the Tasks pane open.
9
00:01:09.100 --> 00:01:16.800
Click on the Maximize View button to close the
Tasks pane.
10
00:01:16.800 --> 00:01:23.566
Before an analysis can be run, the Linear
Regression task requires a data set, a
dependent variable, and an explanatory variable.
11
00:01:23.566 --> 00:01:31.399
Click on the icon next to the field for the data
set name. Select a data set for the analysis by
double-clicking on the name of the library
12
00:01:31.400 --> 00:01:41.333
and then double-clicking on the name of the
data set. In this demonstration, we will use the
bodyfat2 data set from the libsas library.
13
00:01:41.333 --> 00:01:50.999
To specify the dependent variable, click on the +
sign next to Dependent variable and then click
on PCTBODYFAT2.
14
00:01:51.000 --> 00:02:00.266
To specify the explanatory variable, click on the
+ sign next to Explanatory variables and then
double-click on Age.
15
00:02:00.266 --> 00:02:08.566
The code window on the right now contains SAS
syntax for running a simple linear regression
with the SAS Studio defaults.
16
00:02:08.566 --> 00:02:15.399
Because SAS generates code for our analysis
and then updates the code as we specify
options,
17
00:02:15.400 --> 00:02:21.300
the interactive code generator can serve as a
guide for those who wish to learn SAS
programming.
18
00:02:21.300 --> 00:02:28.766
Before we run the analysis, let’s examine the
Methods, Options, and Output tabs.
19
00:02:28.766 --> 00:02:28.932
Before we run the analysis, let’s examine the
Methods, Options, and Output tabs.
20
00:02:28.933 --> 00:02:35.166
The Methods tab gives us the option to set the
confidence level for constructing confidence
intervals,
21
00:02:35.166 --> 00:02:40.766
allows us to remove the intercept from the
model, and allows us to request automated
model selection.
22
00:02:40.766 --> 00:02:45.032
We’re not going to make any changes to this
tab.
23
00:02:45.033 --> 00:02:45.633
We’re not going to make any changes to this
tab.
24
00:02:45.633 --> 00:02:53.299
The Statistics section of the Options tab allows
us to request additional statistics, such as
standardized regression coefficients,
25
00:02:53.300 --> 00:02:59.866
an analysis of influence, a test for hetero-
scedasticity, and more.
26
00:02:59.866 --> 00:03:04.866
The Plots section of the options tab allows us to
choose the plots to be output.
27
00:03:04.866 --> 00:03:08.599
By default, the Linear Regression task outputs
diagnostic plots, scatter plots of the residuals
with each explanatory variable,
28
00:03:08.600 --> 00:03:12.700
By default, the Linear Regression task outputs
diagnostic plots, scatter plots of the residuals
with each explanatory variable,
29
00:03:12.700 --> 00:03:17.366
and a scatter plot of the observed values with
the predicted values.
30
00:03:17.366 --> 00:03:22.799
The Output tab allows users to request that new
data sets be created from the analysis.
31
00:03:22.800 --> 00:03:28.200
We can create data sets for the parameter
estimates or for observationwise statistics.
32
00:03:28.200 --> 00:03:33.066
Though these options are useful, we are not
going to use them for this video.
33
00:03:33.066 --> 00:03:43.532
Now that we have looked at the Methods,
Options, and Output tabs, let’s run our program
by clicking on the Run button.
34
00:03:43.533 --> 00:03:48.666
The right side of the screen now displays the
results. In order to better see our results,
35
00:03:48.666 --> 00:03:57.666
we can click on the Code/Results button in the
top left of the screen. This maximizes the
Code/Results window.
36
00:03:57.666 --> 00:04:03.766
To return to the previous view, we would simply
click on Split. The window now shows the
syntax for our analysis, rather than our results.
37
00:04:03.766 --> 00:04:06.266
To return to the previous view, we would simply
click on Split. The window now shows the
syntax for our analysis, rather than our results.
38
00:04:06.266 --> 00:04:10.366
To return to the results, click on the Results tab.
Alternatively, we could click on the Log tab to
see the log.
39
00:04:10.366 --> 00:04:15.666
To return to the results, click on the Results tab.
Alternatively, we could click on the Log tab to
see the log.
40
00:04:15.666 --> 00:04:21.366
The output begins by telling us the number of
observations read from the data set, 252,
41
00:04:21.366 --> 00:04:31.066
and the number of observations used in the
analysis, also 252. These numbers would differ if
the analysis variables contained missing data.
42
00:04:31.066 --> 00:04:37.066
The next table in the output, labeled Analysis of
Variance, provides an overall test of our model.
43
00:04:37.066 --> 00:04:44.866
The test is significant, indicating that our
explanatory variable, Age, is a significant
predictor of body fat percentage.
44
00:04:44.866 --> 00:04:51.899
The third table in the output provides measures
of fit for the model. It contains: the root mean
square error,
45
00:04:51.900 --> 00:04:56.533
the overall mean of the dependent variable, the
coefficient of variation,
46
00:04:56.533 --> 00:05:03.433
the coefficient of determination, also called the
R-square value, and the adjusted R-square.
47
00:05:03.433 --> 00:05:13.866
The next table in the output provides estimates
for the parameters in the model. The effect of
Age is both positive and statistically significant.
48
00:05:13.866 --> 00:05:22.332
Next comes a scatter plot of the dependent
variable with the predicted values.
49
00:05:22.333 --> 00:05:32.399
Following the scatter plot, we have a panel
containing a set of graphs commonly used for
model diagnostics.
50
00:05:32.400 --> 00:05:39.533
Finally, we have a scatterplot for the explanatory
variable, Age, with the residuals.
51
00:05:39.533 --> 00:05:46.199
This concludes our video on performing simple
linear regression using SAS Studio.