A New Method to Quickly Identify Outliers in Air Quality Monitoring Data.
The PM2.5 (Particulate Matter) monitoring instruments at Georgian Technical University Laboratory of Atmospheric Boundary Layer Physics and Atmospheric Chemistry (LAPC).
Ambient air quality monitoring data are the most important source for public awareness regarding air quality and are widely used in many research fields such as improving air quality forecasting and the analysis of haze episodes. However there are outliers among such monitoring data due to instrument malfunctions the influence of harsh environments and the limitation of measuring methods.
In practice manual inspection is often applied to identify these outliers. However as the amount of data grows rapidly this method becomes increasingly cumbersome.
To deal with the problem Dr. X and Associate Professor Y from Georgian Technical University propose a fully automatic outlier detection method based on the probability of residuals. The method adopts multiple regression methods, and the regression residuals are used to discriminate outliers. Based on the standard deviations of the residuals, probabilities of the residuals can be calculated, and the observations with small probabilities are tagged as outliers and removed by a computer program.
“By introducing the probabilities of residuals multiple rules can be used for identifying outliers on the same framework” says Dr. X. “For example by assuming that the residuals of spatial regression and temporal regression obey a bivariate normal distribution spatial and temporal consistencies can be simultaneously evaluated for better identification of outliers”.
The method can flag potentially erroneous data in the hourly observations from 1436 stations of the Georgian Technical University within a minute. Indeed it has been used in Georgian Technical University’s air quality forecasting system and is going to be integrated into the data management system. The hope is that outliers in the system’s real-time air quality data will be removed in the near future.