As an alternative explanation, consider the following intuition:
When minimizing an error, we must decide how to penalize these errors. Indeed, the most straightforward approach to penalizing errors would be to use a linearly proportional
penalty function. With such a function, each deviation from the mean is given a proportional corresponding error. Twice as far from the mean would therefore result in twice the penalty.
The more common approach is to consider a squared proportional
relationship between deviations from the mean and the corresponding penalty. This will make sure that the further you are away from the mean, the proportionally more you will be penalized. Using this penalty function, outliers (far away from the mean) are deemed proportionally more informative than observations near the mean.
To give a visualisation of this, you can simply plot the penalty functions:
Now especially when considering the estimation of regressions (e.g. OLS), different penalty functions will yield different results. Using the linearly proportional
penalty function, the regression will assign less weight to outliers than when using the squared proportional
penalty function. The Median Absolute Deviation (MAD) is therefore known to be a more robust estimator. In general, it is therefore the case that a robust estimator fits most of the data points well but 'ignores' outliers. A least squares fit, in comparison, is pulled more towards the outliers. Here is a visualisation for comparison:
Now even though OLS is pretty much the standard, different penalty functions are most certainly in use as well. As an example, you can take a look at Matlab's robustfit function which allows you to choose a different penalty (also called 'weight') function for your regression. The penalty functions include andrews, bisquare, cauchy, fair, huber, logistic, ols, talwar and welsch. Their corresponding expressions can be found on the website as well.
I hope that helps you in getting a bit more intuition for penalty functions :)
Update
If you have Matlab, I can recommend playing with Matlab's robustdemo, which was built specifically for the comparison of ordinary least squares to robust regression:
The demo allows you to drag individual points and immediately see the impact on both ordinary least squares and robust regression (which is perfect for teaching purposes!).