A Hybrid Approach for Big Data Outlier Detection from Electric Power SCADA System
SCADA (Supervisory Control and Data Acquisition) databases have three main features that identify them as big data systems: volume, variety and velocity. SCADAs are extremely important for the safety and security operation of modern power system and provide essential online information about the power system state to system operators. A current research challenge is to efficiently process this big data, which involves real-time measurements of hundreds of thousands of heterogeneous electrical power system physical measurements. Among the foreseen automation tasks, outlier detection is one of the most important data mining techniques for power systems. However, like others data mining techniques, traditional outlier detection fails when dealing with problems in which the volume and dimensionality of data are as high as the ones observed in a SCADA. This work aims at circumventing these restrictions by presenting a methodology for dealing with SCADA big data that consists of a pre-processing algorithm and hybrid approach outlier detectors. The hybrid approach is assessed using real data from a Brazilian utility company. The results show that the proposed methodology is capable of identifying outliers correlated with important events that affect the system.