In statistics, the term robust or robustness refers to the strength of a statistical model, tests, and procedures according to the specific conditions of the statistical analysis a study hopes to achieve. Given that these conditions of a study are met, the models can be verified to be true through the use of mathematical proofs.
Many models are based upon ideal situations that do not exist when working with real-world data, and, as a result, the model may provide correct results even if the conditions are not met exactly.
Robust statistics, therefore, are any statistics that yield good performance when data is drawn from a wide range of probability distributions that are largely unaffected by outliers or small departures from model assumptions in a given dataset. In other words, a robust statistic is resistant to errors in the results.
One way to observe a commonly held robust statistical procedure, one needs to look no further than t-procedures, which use hypothesis tests to determine the most accurate statistical predictions.
For an example of robustness, we will consider t-procedures, which include the confidence interval for a population mean with unknown population standard deviation as well as hypothesis tests about the population mean.
The use of t-procedures assumes the following:
- The set of data that we are working with is a simple random sample of the population.
- The population that we have sampled from is normally distributed.
In practice with real-life examples, statisticians rarely have a population that is normally distributed, so the question instead becomes, “How robust are our t-procedures?”
In general the condition that we have a simple random sample is more important than the condition that we have sampled from a normally distributed population; the reason for this is that the central limit theorem ensures a sampling distribution that is approximately normal - the greater our sample size, the closer that the sampling distribution of the sample mean is to being normal.
How T-Procedures Function as Robust Statistics
So robustness for t-procedures hinges on sample size and the distribution of our sample. Considerations for this include:
- If the samples size is large, meaning that we have 40 or more observations, then t-procedures can be used even with distributions that are skewed.
- If the sample size is between 15 and 40, then we can use t-procedures for any shaped distribution, unless there are outliers or a high degree of skewness.
- If the sample size is less than 15, then we can use t- procedures for data that have no outliers, a single peak, and are nearly symmetric.
In most cases, robustness has been established through technical work in mathematical statistics, and, fortunately, we do not necessarily need to do these advanced mathematical calculations in order to properly utilize them; we only need to understand what the overall guidelines are for the robustness of our specific statistical method.
T-procedures function as robust statistics because they typically yield good performance per these models by factoring in the size of the sample into the basis for applying the procedure.