Re: Robust fitting of data  Installing R (CRAN) packages
Gordon Haverland
On Thu, 8 Nov 2018 15:58:06 0700
"Gordon Haverland" <ghaverla@...> wrote: Like a lot of things in statistics,This has nothing to do with comparing distributions, but is an example of what computers can bring you. The mean is a measure of central tendency. It is not the only one. The median is the value which is "half way", 50% is below and 50% is above. For a single moded distribution, the mode is the most common value. For symmetric distributions, the mean median and mode should all be equal. Calculating means (averages, expectations) is the presence of outliers results in answers different than should be found. It turns out the median is a more robust measure of central tendency. If you calculate the median in the presence of some (not a lot) of outliers, you probably do much better than calculating averages. Numerical recipes has a function for doing a median fit of a straight line to data. This is as opposed to a least squares fit. Let's say you have a data set, and you add one point to the data set. And then you fit via least squares and you fit via a median method, and you look at how the parameters of the fitted straight line change as a function of where this extra point is (you are moving this extra point around). You are probing the sensitivity of the calculated parameters to the presence of this extra data point. The values found from least squares, will vary smoothly with the position of this extra data point. The values of the median fit will change discontinously as a function of where this extra point is (there will be jumps in parameters). A reasonable thing to do with any data set, is to calculate the average X and Y of the data, and then make up a new data set where you subtract (<X>,<Y>) from each data point. A least squares fit to this new data will pass through (0,0). Normally we assume that there is no error in X and hence all the error is in Y. But if we have a reasonable amount of data, the "error" in moving the data by subtracting off the average of X and Y should not be large. What we are left with, is just to calculate the slope of the point that goes through (0,0). Well, there is a way to robustly solve that problem  the TheilSen estimator. https://en.wikipedia.org/wiki/Theil%E2%80%93Sen_estimator What you do (in theory) is calculate all the 2 point slopes possible in the data, sort them and pick the one in the middle (the median). The number of slopes you have to calculate becomes ridiculously large as the number of data points increases, so their are way to calculate fewer slopes. Just in case you wanted to look at robust methods.  Gord

