Continuous data streams (“time series data”) are usually smoothed before data processing is applied on them. For this purpose both the running mean filter (also called moving/rolling mean/average) and the related running median filter are frequently used. Both have the disadvantage of “cutting off” peaks. This is a side effect of not trying to approximate a given signal in the best way. That is, they apply a simple function without incorporating the error they introduce on the signal during the approximation.

As alternative to such approaches a Savitzky-Golay filter can be used. It tries to approximate a given signal using a sliding window approach and a low degree polynomial to model data within that window. In contrast to running mean/median it also incorporates the introduced error in the approximation process using linear least squares. This leads to not “simply cutting off peaks” but modeling them in the best way possible, just as the rest of the data.

Here’s a simple example of a Savizky-Golay filter in comparison to running mean/median in R on an excerpt of the beaver data:

library(signal)
matplot(data.frame( beaver1[,3], # original data
runmed(beaver1[,3], k = 11), # with running median filter
filter(filt = sgolay(p = 5, n = 11), x = beaver1[,3]) # with SG filter
), type='l', lwd=2, lty=1, ylab='')
legend('topleft', legend=c('original', 'runmed', 'Savitzky–Golay'), col=1:3, lty=1, lwd=2)