bowerman_9e_chap_142.pptx

Chapter 14
Simple Linear Regression Analysis

1

Chapter Outline
14.1 The Simple Linear Regression Model and the Least Square Point Estimates
14.2 Simple Coefficients of Determination and Correlation
14.3 Model Assumptions and the Standard Error
14.4 Testing the Significance of the Slope and
y-Intercept
14.5 Confidence and Prediction Intervals
14-2

2

Chapter Outline Continued
14.6 Testing the Significance of the Population Correlation Coefficient (Optional)
14.7 Residual Analysis
14-3

3

14.1 The Simple Linear Regression Model and the Least Squares Point Estimates
The dependent (or response) variable is the variable we wish to understand or predict
The independent (or predictor) variable is the variable we will use to understand or predict the dependent variable
Regression analysis is a statistical technique that uses observed data to relate the dependent variable to one or more independent variables
The objective is to build a regression model that can describe, predict and control the dependent variable based on the independent variable
LO14-1: Explain the simple linear regression model.
14-4

4

Form of The Simple Linear
Regression Model
y = β0 + β1x + ε
y = β0 + β1x + ε is the mean value of the dependent variable y when the value of the independent variable is x
β0 is the y-intercept; the mean of y when x is zero
β1 is the slope; the change in the mean of y per unit change in x
ε is an error term that describes the effect on y of all factors other than x
LO14-1
14-5

5

Regression Terms
β0 and β1 are called regression parameters
β0 is the y-intercept
β1 is the slope
We do not know the true values of these parameters
So, we must use sample data to estimate them
b0 is the estimate of β0
b1 is the estimate of β1
LO14-1
14-6

6

LO14-1
The Simple Linear Regression Model Illustrated
Figure 14.3

14-7

7

The Least Squares Point Estimates
LO14-2: Find the least squares point estimates of the slope and y-intercept.
14-8

8

Example 14.2 The Tasty Sub Shop Case: The Least Squares Estimates
LO14-2
14-9

9

Example 14.2 The Tasty Sub Shop Case: The Least Squares Estimates
From last slide,
Σyi = 8,603.1
Σxi = 434.1
Σx2i = 20,757.41
Σxiyi = 403,296.96
Once we have these values, we no longer need the raw data
Calculation of b0 and b1 uses these totals
LO14-2
14-10

10

Example 14.2 The Tasty Sub Shop Case (Slope b1)
LO14-2
14-11

11

Example 14.2 The Tasty Sub Shop Case (y-Intercept b0)
Prediction (x = 20.8)
ŷ = b0 + b1x = 183.31 + (15.59)(20.8)
ŷ = 507.69
Residual is 527.1 – 507.69 = 19.41
LO14-2
14-12

Figure 14.5

12

14.2 Simple Coefficients of
Determination and Correlation
How useful is a particular regression model?

One measure of usefulness is the simple coefficient of determination

It is represented by the symbol r2
LO14-3: Calculate and interpret the simple coefficients of determination and correlation.
14-13

13

The Simple Coefficient of Determination,
Total variation is (yi-ȳ)2
Explained variation is (ŷi-ȳ)2
Unexplained variation is (yi-ŷ)2
Total variation is the sum of explained and unexplained variation
Simple coefficient of determination is
is the proportion of explained variation
LO14-3
14-14

14

The Simple Correlation Coefficient,
The simple correlation coefficient between y and x is denoted by r
It is…
if b1 is positive
if b1 is negative
Where b1 is the slope of the least squares line
Simple correlation coefficient measures the strength of the linear relationship between y and x and is denoted by r
LO14-3
14-15

15

LO14-3
Different Values of the Correlation Coefficient
Figure 14.8

14-16

16

14.3 Model Assumptions and the Standard Error
Mean of Zero: At any given value of x, the population of potential error term values has a mean equal to zero
Constant Variance Assumption: At any value of x, the population of potential error term values has a variance that does not depend on the value of x
Normality Assumption: At any given value of x, the population of potential error term values has a normal distribution
Independence Assumption: Any one value of the error term ε is statistically independent of any other value of ε
LO14-4: Describe the assumptions behind simple linear regression and calculate the standard error.
14-17

Figure 14.9

17

LO14-4
The Mean Square Error and the Standard Error
Sum of squared errors

Mean square error
Point estimate of the residual variance σ2

Standard error
Point estimate of the residual standard deviation σ
14-18

18

14.4 Testing the Significance of the Slope and y-Intercept
A regression model is not likely to be useful unless there is a significant relationship between x and y
To test significance, we use the null hypothesis:

H0: β1 = 0

Versus the alternative hypothesis:

Ha: β1 ≠ 0
LO14-5: Test the significance of the slope and y-intercept.
14-19

19

Testing the Significance of the Slope and y-Intercept Continued

LO14-5
14-20

20

An F Test for the Significance of the Slope (Optional)

H0: β1 = 0
Ha: β1  0
Reject H0 in favor of Ha at  if either
F(model) > F
p-value <  F based on one numerator and n - 2 denominator degrees of freedom LO14-6: Test the significance of a simple linear regression model by using an F test (Optional). 14-21 14.5 Confidence and Prediction Intervals The point on the regression line corresponding to a particular value of x0 of the independent variable x is ŷ = b0 + b1x0 It is unlikely that this value will equal the mean value of y when x equals x0 Therefore, we need to place bounds on how far the predicted value might be from the actual value We can do this by calculating a confidence interval mean for the value of y and a prediction interval for an individual value of y LO14-7: Calculate and interpret a confidence interval for a mean value and a prediction interval for an individual value. 14-22 22 Distance Value Both the confidence interval for the mean value of y and the prediction interval for an individual value of y employ a quantity called the distance value The distance value is a measure of the distance between the value x0 of x and Notice that the further x0 is from , the larger the distance value LO14-7 14-23 23 A Confidence Interval and Prediction Interval Assume that the regression assumption holds The formula for a 100 (1 - ) confidence interval for the mean value of y is The formula for a 100 (1 - ) prediction interval for an individual value of y is This is based on n - 2 degrees of freedom LO14-7 14-24 24 Which to Use? The prediction interval is useful if it is important to predict an individual value of the dependent variable A confidence interval is useful if it is important to estimate the mean value The prediction interval will always be wider than the confidence interval LO14-7 14-25 25 14.6 Testing the Significance of the Population Correlation Coefficient (Optional) The simple correlation coefficient (r) measures the linear relationship between the observed values of x and y from the sample The population correlation coefficient (ρ) measures the linear relationship between all possible combinations of observed values of x and y r is an estimate of ρ LO14-8: Test hypotheses about the population correlation coefficient (Optional). 14-26 26 Testing ρ We can test to see if the correlation is significant using the hypotheses H0: ρ = 0 Ha: ρ ≠ 0 The statistic is This test will give the same results as the test for significance on the slope coefficient b1 LO14-8 14-27 27 14.7 Residual Analysis Checks of regression assumptions are performed by analyzing the regression residuals Residuals () are defined as the difference between the observed value of y and the predicted value of y, = y - ŷ Note that is the point estimate of ε If regression assumptions valid, the population of potential error terms will be normally distributed with mean zero and variance σ2 Different error terms will be statistically independent LO14-9: Use residual analysis to check the assumptions of simple linear regression. 14-28 28 Residual Analysis Continued Residuals are randomly and independently selected from normal populations with mean zero and variance σ2 With any real data, assumptions will not hold exactly Mild departures do not affect our ability to make statistical inferences In checking assumptions, we are looking for pronounced departures from the assumptions So, only require residuals to approximately fit the description above LO14-9 14-29 29 LO14-9 Example 14.9 The QHIC Case: Constructing Residual Plots Figure 14.18b Quality Home Improvement Center (QHIC) operates five stores Studies the relationship between home value and yearly expenditure on home upkeep Random sample of 40 homeowners Intercept = –348.3921 Slope 7.2583 14-30 30 Residual Plots Residuals versus independent variable Residuals versus predicted y’s Residuals in time (if the response is a time series) LO14-9 14-31 31 Constant Variance Assumptions To check the validity of the constant variance assumption, examine residual plots against The x values The predicted y values Time (when data is time series) A pattern that fans out says the variance is increasing rather than staying constant A pattern that funnels in says the variance is decreasing rather than staying constant A pattern that is evenly spread within a band says the assumption has been met LO14-9 14-32 32 LO14-9 Constant Variance Visually Figure 14.19 14-33 33 Assumption of Correct Functional Form If the relationship between x and y is something other than a linear one, the residual plot will often suggest a form more appropriate for the model For example, if there is a curved relationship between x and y, a plot of residuals will often show a curved relationship LO14-9 14-34 34 Normality Assumption If the normality assumption holds, a histogram or stem-and-leaf display of residuals should look bell-shaped and symmetric Another way to check is a normal plot of residuals Order residuals from smallest to largest Plot (i) on vertical axis against (i) (i) is the point on the horizontal axis under the curve so the area under this curve to the left is (3i - 1)/(3n + 1) If the normality assumption holds, the plot should have a straight-line appearance LO14-9 14-35 35 Independence Assumption Independence assumption most likely violated by time-series data If the data is not time series, it can be re ed without affecting it For time-series data, the time- ed error terms can be autocorrelated Positive autocorrelation is when a positive error term in time period i tends to be followed by another positive value in i + k Negative autocorrelation is when a positive error term tends to be followed by a negative value Either one will cause a cyclical error term over time LO14-9 14-36 36 LO14-9 Independence Assumption Visually Figure 14.26 a and b 14-37 37 ( ) ( ) ( ) n x x n y y x b y b n x x x x SS n y x y x y y x x SS SS SS b x b b y i i i i i xx i i i i i i xy xx xy å å å å å å å å å = = - = - = - = - = - - = = + = and where 0 β intercept - y the of estimate point squares Least ) ( ) ( ) ( 1 β slope the of estimate point squares Least ˆ equation n /predictio Estimation 1 0 2 2 2 1 1 0   596.15 129.913,1 389.836,29 129.913,1 10 )1.434( 41.757,120 389.836,29 10 )1.603,8)(1.434( 96.296,403 1 2 2 2          xx xy i ixx ii iixy SS SS b n x xSS n yx yxSS 31.183 )41.43)(596.15(31.860 41.43 10 1.434 31.860 10 1.603,8 10        xbyb n x x n y y i i

Continue to order Get a quote

Calculate the price of your order

Type of paper needed:

Pages:

550 words

Academic level:

We'll send you the first draft for approval by September 11, 2018 at 10:52 AM

Total price:

$26

The price is based on these factors:

Academic level

Number of pages

Urgency

Basic features

Free title page and bibliography
Unlimited revisions
Plagiarism-free guarantee
Money-back guarantee
24/7 support

On-demand options

Writer’s samples
Part-by-part delivery
Overnight delivery
Copies of used sources
Expert Proofreading

Paper format

275 words per page
12 pt Arial/Times New Roman
Double line spacing
Any citation style (APA, MLA, Chicago/Turabian, Harvard)

bowerman_9e_chap_142.pptx

Calculate the price of your order

Our guarantees

Money-back guarantee

Zero-plagiarism guarantee

Free-revision policy

Privacy policy

Fair-cooperation guarantee