How do we know the best fit straight line is best?
To prove that you can do this, we will work through a few sets of data, pick an algorithm from Excel that will fit the points to the line, then measure the distance the points are from that line (Fig. 1).
This first example uses the Gaussian method (you learned it in school (y = mx + b or y = ax + b). Note in Fig. 1 that the points to the line follow a vertical line. This is not the shortest distance from point to line--that would be a diagonal line. We will discuss that a bit later.
For this article, we define the best fit line as that line for which the sum of the absolute (abs) differences is the smallest sum.
Method A Method B 2 5 5 2 11 14 16 13 21 25
Now let's look at set of data and two equations and two lines (Figure 2). Here are the data when we calculate the y' (from y' = mx + b or y' = 0.9482x + 0.5155) on the left and on the right are the data using a power equation for y' (y' = 1.5369* x 0821).The line drawn with the ruler is the solid line; the dashed line is the line for the power curve.
Linear abs Power abs (Straight) x y y' y'-y x y y' y'-y 2 4 2.4 1.6 2 4 2.7 1.3 5 3 5.3 2.3 5 3 5.8 2.8 11 12 10.9 1.1 11 12 11.0 1.0 16 14 15.7 1.7 16 14 15.0 1.0 21 23 20.4 2.6 21 23 18.7 4.3 24 22 23.3 1.3 24 22 20.9 1.1 Sum (abs) 10.4 Sum (abs) 11.4
The power curve is good, but not as good as the Linear--based on the distance the points are from the line. [Admittedly, if we tried one of the other algorithms Excel has available, it might be a better line. You must decide how many tries you will use. Excel 2007 can draw 6 lines to fit data.]
We will study three more cases to illustrate this approach to fitting the points to a line and how you can show whether your line in assessing linearity is straight, or not.
Let's look at an example where the line is not straight (Fig. 3) and the equation that best fits the data. In this case, we used a power algorithm to fit the data (y' = [1.0718x.sup.09911]). The straight, dashed line uses y' = mx +b (y' = 1.056*54 - 2.731)
Here are the data from which the plots were made. You see that the sum of the absolute differences of the power curve is less than the straight line. That is the best fit line is NOT the Gaussian straight line.
Straight Line abs Power abs x y y' y'-y x y y' y'-y 54 57 54.3 2.7 54 57 55.8 1.2 80 81 81.7 0.7 80 81 82.4 1.4 99 105 101.8 3.2 99 105 101.8 3.2 101 100 103.9 3.9 101 100 103.8 3.8 105 108 108.1 0.1 105 108 107.9 0.1 110 109 113.4 4.4 110 109 113.4 4.4 131 133 135.6 2.6 131 133 134.3 1.3 134 145 138.8 6.2 134 145 137.4 7.6 Abs sum 23.8 Abs sum 22.6
Let's go to the next example. Here are the data for two lines followed by the graphs of them. The solid line is the linear line and the dashed line is the power line.
Linear abs Power abs X y y' y'-y X y y' y'-y 50 45 46.1 1.1 50 45 46.4 1.4 60 62 57.5 4.5 60 62 57.4 4.6 70 66 69.0 3.0 70 66 68.6 2.6 80 78 80.4 2.4 80 78 80.0 2.0 90 93 91.8 1.2 90 93 91.7 1.3 100 104 103.2 0.8 100 104 103.6 0.4 Sum Sum (abs) 12.9 (abs) 12.3
This is a close call. The sums are not very different and the lines are quite similar. The power sum is a bit better. Should you encounter a situation like this, we suggest that you document it, as you normally would, including all the equations you test, but remark that you have accepted the straight line with the assumption that the random errors in the method explain the small difference.
Let's look at the reverse now--a set of data that curves and see how well the best fit straight line works compared to another line [Fig. 5].Here is the set of x and y values with the y' and y'-y for a straight line and for a polynomial. Note the sums of the absolute differences.
Straight Line abs Line abs x y y' y'-y X y y' y'-y 1 1 -5.4 6.4 1 1 1.2 0.2 3 3 5.6 2.6 3 3 2.3 0.7 5 9 16.6 7.6 5 9 10 1.0 7 25 27.6 2.6 7 25 24.3 0.7 9 45 38.6 6.4 9 45 45.2 0.2 Abs Abs Sum 25.6 Sum 2.8
Here are the data and lines, for the final example.
Linear abs Polynomial abs x y y' y'-y x y y' y'-y 0 0.2 2.6 2.4 0 0.2 -0.9 1.1 5 4.9 6.4 1.5 5 4.9 5.3 0.4 10 10.4 10.2 0.2 10 10.4 11.2 0.8 20 20.6 17.8 2.8 20 20.6 21.4 0.8 40 38 33.0 5.0 40 38 36.5 1.5 60 44 48.2 4.2 60 44 44.4 0.4 sum (abs)16.1 sum (abs) 5.0
When you look at the two lines, it is obvious that the straight line is not as good as the polynomial. This is not to say that the polynomial line is the best fit, only that it's better than the best fit straight line, which is all we want to know.
When you looked at the data, you saw that the method seems to bend about a concentration above 40 (at least). The statistics for the two lines point to that as well--since the straight line is not the best fit. [You may choose to repeat the experiment or simply not report values above 40 without proper dilution of the sample.]
In the laboratory, the calibration curves are usually linear (straight). When assessing linearity, it is acceptable to use the Gaussian curve as the x values are 'assigned' and assumed to have no error.
Before we leave this discussion of determining whether a straight line is the best fit line for a data set, we want to return to Fig. 1 and the fact that the lines from the points to the line were vertical and not the shortest lines. The shortest lines are diagonal. There are algorithms for finding that straight line, but they are not readily available for Excel and are rarely used in our work.
You now have a tool to use to show mathematically that, within the limits of the number of different equations you can test, whether a linearity study or a method comparison has a linear relationship between the x and y values.
Questions for STEP Participants
AMT strongly encourages you to submit your answers online so that the CE credits can be automatically transferred into your AMTrax account. To do so, go to www.americanmedtech.org, click on the AMT Store on the top navigation bar. Click on STEP Online, then select the article number and purchase the test. Don't forget to log in to receive the discounted member price of $5 (nonmembers pay $15/test).
If you wish to submit answers manually (only available to AMT members), the fee is $10/test. Please submit a copy of this page with your answers marked, along with a completed order form located elsewhere in this publication. Don't forget to include payment.
1. For a Gaussian line we assume the x-values are correct.
2. Gauss measured the shortest distance from the point to his line.
3. I f the line is not ruler-straight the method is probably not linear
4. The reason the dots are not on the line in Fig. 1 is
A. Systematic error
B. Random error
C. Both Systematic and random
5. The method we plot in Fig. 2 is linear.
6. Running each of the calibrators 2 or 3 time would make the curves straight
7. We use the sum of the (y'-y) to determine linearity
8. Fig. 6 has a straight line indicating that the method is linear.
9. Testing a data set with the ruler and another from Excel is sufficient to determine linearity
10. The equation for the Gaussian curve is
A. Y = mx + b
B. Y= mx - b
C. Y= x - bm
D. Y = m - bx
Plotting A Best Fit Line Using Excel
If you are new to creating a chart, go to this web site for a very clear tutorial. http://www.excel-easy.com/data-analysis/ charts.html
After creating a chart in Microsoft Excel, a best fit line can be found as follows:
1. Ensure you are on the worksheet with the chart you add a line to.
2. Move the mouse cursor to any data point and press the left mouse button. All of the data points should now be highlighted. Now, while the mouse cursor is still on any one of the highlighted data points, press the right mouse button, and click on Add Trendline from the menu that appears.
3. From within the "Trendline" window, click on the box with the type of fit you want (e.g., Linear).
4. Click on Options at the top of the "Trendline" window.
5. Pick the line from the 6 options that Excel 2003 provides (Excel 2010 and 2013 give you a choice of more than 20, although not all are useful for straight lines) that you want to fit your data.
6. To determine whether the line you picked is the best, you will need to use the formula that you can also add to the chart (note the add equation as an option) to calculate the y' and the differences as we discussed above. We suggest you test only 2 or 3 as well as the linear one. This is easy to do with the template available at the website.
David Plaut, Plano, TX, Consultant, AMT's Book Reviewer, and frequent speaker at AMT national conventions and regional meetings
Deena Davis, MLS, Point of Care Coordinator for Bozeman Deaconess Hospital, Bozeman, MT
|Printer friendly Cite/link Email Feedback|
|Author:||Plaut, David; Davis, Deena|
|Publication:||Journal of Continuing Education Topics & Issues|
|Date:||Aug 1, 2014|
|Previous Article:||Cases in clinical microbiology.|
|Next Article:||Case twenty nine: the synergistic role of microbiology and pathology.|