본문 바로가기
HOME> 논문 > 논문 검색상세

논문 상세정보

Journal of the Korean Data & Information Science Society = 한국데이터정보과학회지 v.21 no.2, 2010년, pp.371 - 377   피인용횟수: 1
본 등재정보는 저널의 등재정보를 참고하여 보여주는 베타서비스로 정확한 논문의 등재여부는 등재기관에 확인하시기 바랍니다.

Simple hypotheses testing for the number of trees in a random forest

Park, Cheol-Yong    (Department of Statistics, Keimyung University  );
  • 초록

    In this study, we propose two informal hypothesis tests which may be useful in determining the number of trees in a random forest for use in classification. The first test declares that a case is 'easy' if the hypothesis of the equality of probabilities of two most popular classes is rejected. The second test declares that a case is 'hard' if the hypothesis that the relative difference or the margin of victory between the probabilities of two most popular classes is greater than or equal to some small number, say 0.05, is rejected. We propose to continue generating trees until all (or all but a small fraction) of the training cases are declared easy or hard. The advantage of combining the second test along with the first test is that the number of trees required to stop becomes much smaller than the first test only, where all (or all but a small fraction) of the training cases should be declared easy.


  • 주제어

    Hypotheses testing .   random forest.  

  • 참고문헌 (11)

    1. Alam, K. (1971). On selecting the most probable category. Technometrics, 13, 843-850. 
    2. Amaratunga, D., Cabrera, J. and Lee, Y. S. (2008). Enriched random forests. Bioinformatics, 24, 2010-2014. 
    3. Bhandari, S. K. and Ali, M. M. (1994). An asymptotically minimax procedure for selecting the t -best multinomial cells. Journal of Statistical Planning & Inference, 38, 65-74. 
    4. Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123-140. 
    5. Breiman, L. (2001). Random forests. Machine Learning, 45, 5-32. 
    6. Dudoit, S., Fridlyand, J. and Speed, T. P. (2002). Comparison of discrimination methods for the classfi- cation of tumors using gene expression data. Journal of the American Statistical Society, 97, 77-87. 
    7. Hamza, M. and Larocque, D. (2005). An empirical comparison of ensemble methods based on classification trees. Journal of Statistical Computation & Simulation, 75, 629-643. 
    8. Lee, J. W., Lee, J. B., Park, M. and Song, S. H. (2005). An extensive evaluation of recent classification tools applied to microarray data. Computational Statistics & Data Analysis, 48, 869-885. 
    9. Park, C. (2007). A stopping rule for the number of generating trees in a random forest. Journal of the Institute of Natural Sciences, 27, 7-10. 
    10. Ramey, J. T. and Alam, K. (1979). A sequential procedure for selecting the most probable multinomial event. Biometrika, 55, 171-173. 
    11. Shapire, R., Freund, Y., Bartlett, P. and Lee, W. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26, 1651-1686. 
  • 이 논문을 인용한 문헌 (1)

    1. Park, Cheolyong 2016. "A simple diagnostic statistic for determining the size of random forest" Journal of the Korean Data & Information Science Society = 한국데이터정보과학회지, 27(4): 855~863     

 저자의 다른 논문

  • Park, Cheol-Yong (40)

    1. 1998 "A Simple Nonparametric Test of Complete Independence" 한국통계학회 논문집 = Communications of the Korean Statistical Society 5 (2): 411~416    
    2. 1999 "On the Estimation in Regression Models with Multiplicative Errors" Journal of the Korean Data & Information Science Society = 한국데이터정보과학회지 10 (1): 193~198    
    3. 2000 "A Rao-Robson Chi-Square Test for Multivariate Normality Based on the Mahalanobis Distances" 한국통계학회 논문집 = Communications of the Korean Statistical Society 7 (2): 385~392    
    4. 2000 "주변값이 주어진 이원분할표에 대한 카이제곱 검정통계량의 소표본 분포 및 대표본 분포와의 일치성 연구" Journal of the Korean Data & Information Science Society = 한국데이터정보과학회지 11 (1): 83~90    
    5. 2001 "A Simple Chi-squared Test of Multivariate Normality Based on the Spherical Data" 한국통계학회 논문집 = Communications of the Korean Statistical Society 8 (1): 117~126    
    6. 2002 "Analysis of Students Leaving Their Majors Using Decision Tree" Journal of the Korean Data & Information Science Society = 한국데이터정보과학회지 13 (2): 157~165    
    7. 2003 "The Rao-Robson Chi-Squared Test for Multivariate Structure" Journal of the Korean Data & Information Science Society = 한국데이터정보과학회지 14 (4): 1013~1021    
    8. 2004 "A Note on the Simple Chi-Squared Test of Multivariate Normality" Journal of the Korean Data & Information Science Society = 한국데이터정보과학회지 15 (2): 423~430    
    9. 2005 "구형 대칭성 검정에 대한 연구" 응용통계연구 = The Korean journal of applied statistics 18 (1): 99~113    
    10. 2005 "A Simple Chi-Squared Test of Spherical Symmetry" Journal of the Korean Data & Information Science Society = 한국데이터정보과학회지 16 (2): 227~236    

 활용도 분석

  • 상세보기

    amChart 영역
  • 원문보기

    amChart 영역

원문보기

무료다운로드
  • NDSL :
  • 한국데이터정보과학회 : 저널
유료다운로드

유료 다운로드의 경우 해당 사이트의 정책에 따라 신규 회원가입, 로그인, 유료 구매 등이 필요할 수 있습니다. 해당 사이트에서 발생하는 귀하의 모든 정보활동은 NDSL의 서비스 정책과 무관합니다.

원문복사신청을 하시면, 일부 해외 인쇄학술지의 경우 외국학술지지원센터(FRIC)에서
무료 원문복사 서비스를 제공합니다.

NDSL에서는 해당 원문을 복사서비스하고 있습니다. 위의 원문복사신청 또는 장바구니 담기를 통하여 원문복사서비스 이용이 가능합니다.

이 논문과 함께 출판된 논문 + 더보기