Doubly Penalized LASSO for Reconstruction of Biological Networks
Reconstruction of biological and biochemical networks is a crucial step in extracting knowledge and causal information from large biological data sets. This task is particularly challenging when dealing with time-series data from dynamic networks. We have developed a new method, called doubly penalized least absolute shrinkage and selection operator (DPLASSO), for the reconstruction of dynamic biological networks. Our method consists of two components: statistical significance testing of model coefficients and penalized/constrained optimization. A partial least squares (PLS) with statistical significance testing acts as a supervisory-level filter to extract the most informative components of the network from a data set. Then, LASSO with extra weights on the smaller parameters identified in the first layer is employed to retain the main predictors and to set the smallest coefficients to zero. We present two case studies to compare the relative performance of DPLASSO and LASSO in terms of several metrics, such as sensitivity, specificity, and accuracy. The first case study employs a synthetic data set; it demonstrates that DPLASSO substantially improves network reconstruction in terms of accuracy and specificity. The second case study relies on simulated data sets for cell division cycle of fission yeast and shows that DPLASSO overall outperforms LASSO and PLS in terms of sensitivity, specificity, and accuracy. Combining PLS with statistical significance testing and LASSO provides good performance with respect to several metrics. DPLASSO is well suited for reconstruction of networks with low and moderate density. For high-density networks, high specificity is desired, making PLS a more suitable approach.