본문 바로가기
HOME> 논문 > 논문 검색상세

논문 상세정보

Interdisciplinary Bio Central v.2 no.2, 2010년, pp.4.1 - 4.6  

Improved Statistical Testing of Two-class Microarrays with a Robust Statistical Approach

Oh, Hee-Seok    (Department of Statistics, Seoul National University   ); Jang, Dong-Ik    (Department of Statistics, Seoul National University   ); Oh, Seung-Yoon    (Interdisciplinary Program in Bioinformatics, Seoul National University   ); Kim, Hee-Bal    (Interdisciplinary Program in Bioinformatics, Seoul National University  );
  • 초록

    The most common type of microarray experiment has a simple design using microarray data obtained from two different groups or conditions. A typical method to identify differentially expressed genes (DEGs) between two conditions is the conventional Student's t-test. The t-test is based on the simple estimation of the population variance for a gene using the sample variance of its expression levels. Although empirical Bayes approach improves on the t-statistic by not giving a high rank to genes only because they have a small sample variance, the basic assumption for this is same as the ordinary t-test which is the equality of variances across experimental groups. The t-test and empirical Bayes approach suffer from low statistical power because of the assumption of normal and unimodal distributions for the microarray data analysis. We propose a method to address these problems that is robust to outliers or skewed data, while maintaining the advantages of the classical t-test or modified t-statistics. The resulting data transformation to fit the normality assumption increases the statistical power for identifying DEGs using these statistics.


  • 주제어

    Microarray .   t-test .   empirical Bayes .   Pseudo data.  

  • 참고문헌 (21)

    1. Aittokallio, T., Kurki, M., Nevalainen, O., Nikula, T., West, A. and Lahesmaa, R. (2003). Computational strategies for analyzing data in gene expression microarray experiments. J Bioinform Comput Biol 1, 541-586. 
    2. Allison, D. B., Cui, X., Page, G. P. and Sabripour, M. (2006). Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 7, 55-65. 
    3. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B 57, 289-300. 
    4. Cox, D. D. (1983). Asymptotics for M-type smoothing splines. Ann. Statist 11, 530-551. 
    5. Cui, X., Hwang, J. T., Qiu, J., Blades, N. J. and Churchill, G. A. (2005). Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics 6, 59-75. 
    6. Gosset, W. S. (1908). The probable error of a mean. Biometrika 6, 1-25. 
    7. Hever, A., Roth, R. B., Hevezi, P., Marin, M. E., Acosta, J. A., Acosta, H., Rojas, J., Herrera, R., Grigoriadis, D., White, E., Conlon, P. J., Maki, R. A. and Zlotnik, A. (2007). Human endometriosis is associated with plasma cells and overexpression of B lymphocyte stimulator. Proceedings of the National Academy of Sciences 104, 12451-12456. 
    8. Huber, P. J. (1973). Robust regression: asymptotics, conjectures and Monte Carlo. Annals of Statistics 1, 799-821. 
    9. Irizarry, R. A. (2005). From CEL files to annotated lists of interesting genes. Bioinformatics and Computational Biology Solutions Using R and Bioconductor?Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S, eds, 434-435. 
    10. Ishwaran, H. and Rao, J. S. (2003). Detecting Differentially Expressed Genes in Microarrays Using Bayesian Model Selection. Journal of the American Statistical Association 98, 438-456. 
    11. Ishwaran, H. and Rao, J. S. (2005). Spike and Slab Gene Selection for Multigroup Microarray Data. Journal of the American Statistical Association 100, 764-781. 
    12. Oh, H. S., Nychka, D. W. and Lee, T. (2007). The Role of Pseudo Data for Robust Smoothing with Application to Wavelet Regression. Biometrika 94, 893. 
    13. Papana, A. and Ishwaran, H. (2006). CART variance stabilization and regularization for high-throughput genomic data. Bioinformatics 22, 2254-2261. 
    14. Pavlidis, P., Li, Q. and Noble, W. S. (2003). The effect of replication on gene expression microarray experiments. Bioinformatics 19, 1620-1627. 
    15. Schena, M., Shalon, D., Davis, R. W. and Brown, P. O. (1995). Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467-470. 
    16. Smyth, G. K. (2004). Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments. Statistical Applications in Genetics and Molecular Biology 3, 1027. 
    17. Tsai, C. A., Hsueh, H. M. and Chen, J. J. (2003). Estimation of false discovery rates in multiple testing: application to gene microarray data. Biometrics 59, 1071-1081. 
    18. Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98, 5116-5121. 
    19. Wang, S. and Ethier, S. (2004). A generalized likelihood ratio test to identify differentially expressed genes from microarray data. Bioinformatics 20, 100-104. 
    20. Yan, X., Deng, M., Fung, W. K. and Qian, M. (2005). Detecting differentially expressed genes by relative entropy. J Theor Biol 234, 395-402. 
    21. Yoon, S., Yang, Y., Choi, J. and Seong, J. (2006). Large scale data mining approach for gene-specific standardization of microarray gene expression data. Bioinformatics 22, 2898-2904. 

 활용도 분석

  • 상세보기

    amChart 영역
  • 원문보기

    amChart 영역

원문보기

무료다운로드
유료다운로드
  • EBSCOhost-Academic Search Premier : 저널

유료 다운로드의 경우 해당 사이트의 정책에 따라 신규 회원가입, 로그인, 유료 구매 등이 필요할 수 있습니다. 해당 사이트에서 발생하는 귀하의 모든 정보활동은 NDSL의 서비스 정책과 무관합니다.

원문복사신청을 하시면, 일부 해외 인쇄학술지의 경우 외국학술지지원센터(FRIC)에서
무료 원문복사 서비스를 제공합니다.

NDSL에서는 해당 원문을 복사서비스하고 있습니다. 위의 원문복사신청 또는 장바구니 담기를 통하여 원문복사서비스 이용이 가능합니다.

이 논문과 함께 출판된 논문 + 더보기