Understanding Simple Linear Regression

Understanding Simple Linear Regression

Statistical Technique in Review

In nursing practice, the ability to predict future events or outcomes is crucial, and researchers calculate and report linear regression results as a basis for making these predictions. Linear regression provides a means to estimate or predict the value of a dependent variable based on the value of one or more independent variables. The regression equation is a mathematical expression of a causal proposition emerging from a theoretical framework. The linkage between the theoretical statement and the equation is made prior to data collection and analysis. Linear regression is a statistical method of estimating the expected value of one variable, y, given the value of another variable, x. The focus of this exercise is simple linear regression, which involves the use of one independent variable, x, to predict one dependent variable, y.

The regression line developed from simple linear regression is usually plotted on a graph, with the horizontal axis representing x (the independent or predictor variable) and the vertical axis representing the y (the dependent or predicted variable; see Figure 14-1). The value represented by the letter a is referred to as the y intercept, or the point where the regression line crosses or intercepts the y-axis. At this point on the regression line, x = 0. The value represented by the letter b is referred to as the slope, or the coefficient of x. The slope determines the direction and angle of the regression line within the graph. The slope expresses the extent to which y changes for every one-unit change in x. The score on variable y (dependent variable) is predicted from the subject’s known score on variable x (independent variable). The predicted score or estimate is referred to as Ŷ (expressed as y-hat) (Cohen, 1988Grove, Burns, & Gray, 2013Zar, 2010).



Simple linear regression is an effort to explain the dynamics within a scatterplot (see Exercise 11) by drawing a straight line through the plotted scores. No single regression line can be used to predict, with complete accuracy, every y value from every x value. However, the purpose of the regression equation is to develop the line to allow the highest degree of prediction possible, the line of best fit. The procedure for developing the line of best fit is the method of least squares. If the data were perfectly correlated, all data points would fall along the straight line or line of best fit. However, not all data points fall on the line of best fit in studies, but the line of best fit provides the best equation for the values of y to be predicted by locating the intersection of points on the line for any given value of x.

The algebraic equation for the regression line of best fit is y = bx + a, where:







a=y−intercept(thepointwheretheregressionlineintersectsthe y-axis),alsocalledtheregressionconstant(Zar,2010).  


In Figure 14-2, the x-axis represents Gestational Age in weeks and the y-axis represents Birth Weight in grams. As gestational age increases from 20 weeks to 34 weeks, birth weight also increases. In other words, the slope of the line is positive. This line of best fit can be used to predict the birth weight (dependent variable) for an infant based on his or her gestational age in weeks (independent variable). Figure 14-2 is an example of a line of best fit that was not developed from research data. In addition, the x-axis was started at 22 weeks rather than 0, which is the usual start in a regression figure. Using the formula y = bx + a, the birth weight of a baby born at 28 weeks of gestation is calculated below.








The regression line represents y for any given value of x. As you can see, some data points fall above the line, and some fall below the line. If we substitute any x value in the regression equation and solve for y, we will obtain a ŷ that will be somewhat different from the actual values. The distance between the ŷ and the actual value of y is called residual, and this represents the degree of error in the regression line. The regression line or the line of best fit for the data points is the unique line that will minimize error and yield the smallest residual (Zar, 2010). The step-by-step process for calculating simple linear regression in a study is presented in Exercise 29.

Research Article


Flannigan, C., Bourke, T. W., Sproule, A., Stevenson, M., & Terris, M. (2014). Are APLS formulae for estimating weight appropriate for use in children admitted to PICU? Resuscitation, 85(7), 927–931.


Medications and other therapies often necessitate knowing a patient’s weight. However, a child may be admitted to a pediatric intensive care unit (PICU) without a known weight, and instability and on-going resuscitation may prevent obtaining this needed weight. Clinicians would benefit from a tool that could accurately estimate a patient’s weight when such information is unavailable. Thus Flannigan et al. (2014) conducted a retrospective observational study for the purpose of determining “if the revised APLS UK [Advanced Paediatric Life Support United Kingdom] formulae for estimating weight are appropriate for use in the paediatric care population in the United Kingdom” (Flannigan et al., 2014, p. 927). The sample included 10,081 children (5,622 males and 4,459 females), who ranged from term-corrected age to 15 years of age, admitted to the PICU during a 5-year period. Because this was a retrospective study, no geographic location, race, and ethnicity data were collected for the sample. A paired samples t-test was used to compare mean sample weights with the APLS UK formula weight. The “APLS UK formula ‘weight = (0.05 × age in months) + 4’ significantly overestimates the mean weight of children under 1 year admitted to PICU by between 10% [and] 25.4%” (Flannigan et al., 2014, p. 928). Therefore, the researchers concluded that the APLS UK formulas were not appropriate for estimating the weight of children admitted to the PICU.

Relevant Study Results

“Simple linear regression was used to produce novel formulae for the prediction of the mean weight specifically for the PICU population” (Flannigan et al., 2014, p. 927). The three novel formulas are presented in Figures 12, and 3, respectively. The new formulas calculations are more complex than the APLS UK formulas. “Although a good estimate of mean weight can be obtained by our newly derived formula, reliance on mean weight alone will still result in significant error as the weights of children admitted to PICU in each age and sex [gender] group have a large standard deviation . . . Therefore as soon as possible after admission a weight should be obtained, e.g., using a weight bed” (Flannigan et al., 2014, p. 929).


FIGURE 1  Comparison of actual weight with weight calculated using APLS formula “Weight in kg = (0.5 × age in months) + 4” and novel formula “Weight in kg = (0.502 × age in months) + 3.161” Flannigan, C., Bourke, T. W., Sproule, A., Stevenson, M., & Terris, M. (2014). Are APLS formulae for estimating weight appropriate for use in children admitted to PICU? Resuscitation, 85(7), p. 928.


FIGURE 2  Comparison of actual weight with weight calculated using APLS formula “Weight in kg = (2 × age in years) + 8” and novel formula “Weight in kg = (0.176 × age in months) + 7.241” Flannigan, C., Bourke, T. W., Sproule, A., Stevenson, M., & Terris, M. (2014). Are APLS formulae for estimating weight appropriate for use in children admitted to PICU? Resuscitation, 85(7), p. 928.


FIGURE 3  Comparison of actual weight with weight calculated using APLS formula “Weight in kg = (3 × age in years) + 7” and novel formula “Weight in kg = (0.331 × age in months) − 6.868” Flannigan, C., Bourke, T. W., Sproule, A., Stevenson, M., & Terris, M. (2014). Are APLS formulae for estimating weight appropriate for use in children admitted to PICU? Resuscitation, 85(7), p. 929.  144

Study Questions

1. What are the variables on the x- and y-axes in Figure 1 from the Flannigan et al. (2014) study?

2. What is the name of the type of variable represented by x and y in Figure 1? Is x or y the score to be predicted?

3. What is the purpose of simple linear regression analysis and the regression equation?

4. What is the point where the regression line meets the y-axis called? Is there more than one term for this point and what is the value of x at that point?

5. In the formula y = bx + a, is a or b the slope? What does the slope represent in regression analysis?

6. Using the values a = 3.161 and b = 0.502 with the novel formula in Figure 1, what is the predicted weight in kilograms for a child at 5 months of age? Show your calculations.145

7. What are the variables on the x-axis and the y-axis in Figures 2 and 3? Describe these variables and how they might be entered into the regression novel formulas identified in Figures 2 and 3.

8. Using the values a = 7.241 and b = 0.176 with the novel formula in Figure 2, what is the predicted weight in kilograms for a child at 4 years of age? Show your calculations.

9. Does Figure 1 have a positive or negative slope? Provide a rationale for your answer. Discuss the meaning of the slope of Figure 1.

10. According to the study narrative, why are estimated child weights important in a pediatric intensive care (PICU) setting? What are the implications of these findings for practice?146

Answers to Study Questions

1. The x variable is age in months, and the y variable is weight in kilograms in Figure 1.

2. x is the independent or predictor variable. y is the dependent variable or the variable that is to be predicted by the independent variable, x.

3. Simple linear regression is conducted to estimate or predict the values of one dependent variable based on the values of one independent variable. Regression analysis is used to calculate a line of best fit based on the relationship between the independent variable x and the dependent variable y. The formula developed with regression analysis can be used to predict the dependent variable (y) values based on values of the independent variable x.

4. The point where the regression line meets the y-axis is called the y intercept and is also represented by a (see Figure 14-1). a is also called the regression constant. At the y intercept, x = 0.

5. b is the slope of the line of best fit (see Figure 14-1). The slope of the line indicates the amount of change in y for each one unit of change in x. b is also called the regression coefficient.

6. Use the following formula to calculate your answer: y = bx + a
y = 0.502 (5) + 3.161 = 2.51 + 3.161 = 5.671 kilograms
Note: Flannigan et al. (2014) expressed the novel formula of weight in kilograms = (0.502 × age in months) + 3.161 in the title of Figure 1.

7. Age in years is displayed on the x-axis and is used for the APLS UK formulas in Figures 2 and 3Figure 2 includes children 1 to 5 years of age, and Figure 3 includes children 6 to 12 years of age. However, the novel formulas developed by simple linear regression are calculated with age in months. Therefore, the age in years must be converted to age in months before calculating the y values with the novel formulas provided for Figures 2 and 3. For example, a child who is 2 years old would be converted to 24 months (2 × 12 mos./year = 24 mos.). Then the formulas in Figures 2 and 3 could be used to predict y (weight in kilograms) for the different aged children. The y-axis on both Figures 2 and 3 is weight in kilograms (kg).

8. First calculate the child’s age in months, which is 4 × 12 months/year = 48 months.
y = bx + a = 0.176 (48) + 7.241 = 8.448 + 7.241 = 15.689 kilograms
Note the x value needs to be in age in months and Flannigan et al. (2014) expressed the novel formula of weight in kilograms = (0.176 × age in months) + 7.241.147

9. Figure 1 has a positive slope since the line extends from the lower left corner to the upper right corner and shows a positive relationship. This line shows that the increase in x (independent variable) is associated with an increase in y (dependent variable). In the Flannigan et al. (2014) study, the independent variable age in months is used to predict the dependent variable of weight in kilograms. As the age in months increases, the weight in kilograms also increases, which is the positive relationship illustrated in Figure 1.

10. According to Flannigan et al. (2014, p. 927), “The gold standard for prescribing therapies to children admitted to Paediatric Intensive Care Units (PICU) requires accurate measurement of the patient’s weight. . . . An accurate weight may not be obtainable immediately because of instability and on-going resuscitation. An accurate tool to aid the critical care team estimate the weight of these children would be a valuable clinical tool.” Accurate patient weights are an important factor in preventing medication errors particularly in pediatric populations. The American Academy of Pediatrics (AAP)’s policy on Prevention of Medication Errors in the Pediatric Inpatient Setting can be obtained from the following website: https://www.aap.org/en-us/advocacy-and-policy/federal-advocacy/Pages/Federal-Advocacy.aspx#SafeandEffectiveDrugsandDevicesforChildren. The Centers for Medicare & Medicaid Services, Partnership for Patients provides multiple links to Adverse Drug Event (ADE) information including some resources specific to pediatrics at http://partnershipforpatients.cms.gov/p4p_resources/tsp-adversedrugevents/tooladversedrugeventsade.html.