Layman Analytics System: A Cloud-Enabled System for Data Analytics Workflow Recommendation
In today’s big data era, there is a tremendously huge amount of data available. Layman users lack not only the knowledge and experience in data analytics to make sense of these data but also the computational resources for executing the analytics. In this paper, we propose and develop a layman analytics system (LAS), which provides the layman users with a scalable and ready-to-use analytics tool to automatically generate analytics workflows for classification tasks. The LAS is designed to benefit from existing open-source data analytics tools using generic ontological modeling of analytics operators from these tools as well as adaptive constraint refinement for metadata learning. Moreover, the LAS can be deployed on both public and private clouds to cater to the need of scalable computing and easy maintenance. To demonstrate the performance of the LAS, we conducted experiments with 114 data sets obtained from the University of California Irvine Machine Learning Repository. The workflows generated by the LAS were benchmarked against the OpenML whereby each data set has a range of classification accuracy obtained using classifiers designed and fine-tuned by data experts. The comparisons showed that 87 out of 114 data sets have exceeded the 50th percentile of the benchmark data. Among these 87 data sets, the LAS outperforms the 90th percentile of the benchmarks on 49 data sets.