TBC: A Clustering Algorithm Based on Prokaryotic Taxonomy
High-throughput DNA sequencing technologies have revolutionized the study of microbial ecology. Massive sequencing of PCR amplicons of the 16S rRNA gene has been widely used to understand the microbial community structure of a variety of environmental samples. The resulting sequencing reads are clustered into operational taxonomic units that are then used to calculate various statistical indices that represent the degree of species diversity in a given sample. Several algorithms have been developed to perform this task, but they tend to produce different outcomes. Herein, we propose a novel sequence clustering algorithm, namely Taxonomy-Based Clustering (TBC). This algorithm incorporates the basic concept of prokaryotic taxonomy in which only comparisons to the type strain are made and used to form species while omitting full-scale multiple sequence alignment. The clustering quality of the proposed method was compared with those of MOTHUR, BLASTClust, ESPRIT-Tree, CD-HIT, and UCLUST. A comprehensive comparison using three different experimental datasets produced by pyrosequencing demonstrated that the clustering obtained using TBC is comparable to those obtained using MOTHUR and ESPRIT-Tree and is computationally efficient. The program was written in JAVA and is available from http://sw.ezbiocloud.net/tbc.
- Cai, Y. and Sun, Y. 2011. ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time. Nucleic Acids Res. doi:10.1093/nar/gkr349.
- Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389-3402.
- Bacon, D.J. and Anderson, W.F. 1986. Multiple sequence alignment. J. Mol. Biol. 191, 153-161.
- Cameron, M., Bernstein, Y., and Williams, H.E. 2007. Clustered sequence representation for fast homology search. J. Comput. Biol. 14, 594-614.
- Chao, A. 1984. Non-parametric estimation of the number of classes in a population. Scand. J. Stat. 11, 265-270.
- Chao, A.L. and Lee, S.M. 1992. Estimating the number of classes via sample coverage. J. Am. Stat. Assoc. 87, 210-217.
- Chao, A.M., Ma, M.C., and Yang, M.C.K. 1993. Stopping rules and estimation for recapture debugging with unequal failure rates. Biometrika 80, 193-201.
- Chun, J., Kim, K.Y., Lee, J.H., and Choi, Y. 2010. The analysis of oral microbial communities of wild-type and toll-like receptor 2-deficient mice using a 454 GS FLX Titanium pyrosequencer. BMC Microbiol. 10, 101.
- Edgar, R.C. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792- 1797.
- Edgar, R.C. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460-2461.
- Hamady, M. and Knight, R. 2009. Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. Genome Res. 19, 1141-1152.
- Hurlbert, S.H. 1971. The non-concept of species diversity: a critique and alternative parameters. Ecology 52, 577-586.
- Kuenne, C.T., Ghai, R., Chakraborty, T., and Hain, T. 2007. GECO - linear visualization for comparative genomics. Bioinformatics 23, 125-126.
- Li, W. and Godzik, A. 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658-1659.
- Li, W., Jaroszewski, L., and Godzik, A. 2001. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics 17, 282-283.
- Li, W., Jaroszewski, L., and Godzik, A. 2002. Sequence clustering strategies improve remote homology recognitions while reducing search times. Protein Eng. 15, 643-649.
- Li, W., Wooley, J.C., and Godzik, A. 2008. Probing metagenomics by rapid cluster analysis of very large datasets. PLoS One 3, e3375.
- Ling, Z., Kong, J., Liu, F., Zhu, H., Chen, X., Wang, Y., Li, L., Nelson, K.E., Xia, Y., and Xiang, C. 2010. Molecular analysis of the diversity of vaginal microbiota associated with bacterial vaginosis. BMC Genomics 11, 488.
- Metzker, M.L. 2010. Sequencing technologies - the next generation. Nat. Rev. Genet. 11, 31-46.
- Myers, E.W. and Miller, W. 1988. Optimal alignments in linear space. Comput. Appl. Biosci. 4, 11-17.
- Petrosino, J.F., Highlander, S., Luna, R.A., Gibbs, R.A., and Versalovic, J. 2009. Metagenomic pyrosequencing and microbial identification. Clin. Chem. 55, 856-866.
- Retief, J.D. 2000. Phylogenetic analysis using PHYLIP. Methods Mol. Biol. 132, 243-258.
- Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B., Lesniewski, R.A., Oakley, B.B., Parks, D.H., Robinson, C.J., and et al. 2009. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537-7541.
- Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673- 4680.
- Wayne, L.G., Brenner, D.J., Colwell, R.R., Grimont, P.A.D., Kandler, O., Krichevsky, M.I., Moore, L.H., Moore, W.E.C., Murray, R.G.E., Stackebrandt, E., and et al. 1987. Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. Int. J. Syst. Bacteriol. 37, 463-464.
- Yang, F., Zhu, Q., Tang, D., and Zhao, M. 2009. Using affinity propagation combined post-processing to cluster protein sequences. Protein Pept. Lett. 17, 681-689.
이 논문을 인용한 문헌 (6)
- 2012. "" The journal of microbiology, 50(6): 1081~1085
- Moon, Jong-Geun ; Jung, Man-Young ; Kim, Jong-Geol ; Park, Soo-Je ; Kim, Dae-Shin ; Kim, Jong-Shik ; Rhee, Sung-Keun 2013. "A Unique Prokaryotic Assemblage of Wall Biofilm of a Volcanic Cave (Daesubee) in Jeju" Korean journal of microbiology = 미생물학회지, 49(2): 184~190
- Noh, Eun Soo ; Kim, Young-Sam ; Kim, Dong-Hyun ; Kim, Kyoung-Ho 2013. "Bacterial Diversity in the Guts of Sea Cucumbers (Apostichopus japonicus) and Shrimps (Litopenaeus vannamei) Investigated with Tag-Encoded 454 Pyrosequencing of 16S rRNA Genes" Korean journal of microbiology = 미생물학회지, 49(3): 237~244
- Park, Seung-Hyeon ; Choe, Hyeon-Su ; Gwon, Seon-Yeong ; Yun, Seong-Ro 2014. "" 정보과학회지 = Communications of the Korean Institute of Information Scientists and Engineers, 32(3): 46~53
- 2015. "" Journal of the Korean Society for Applied Biological Chemistry, 58(6): 795~805
- 2016. "" Journal of microbiology and biotechnology, 26(7): 1303~1310
- 원문이 없습니다.
유료 다운로드의 경우 해당 사이트의 정책에 따라 신규 회원가입, 로그인, 유료 구매 등이 필요할 수 있습니다. 해당 사이트에서 발생하는 귀하의 모든 정보활동은 NDSL의 서비스 정책과 무관합니다.
원문복사신청을 하시면, 일부 해외 인쇄학술지의 경우 외국학술지지원센터(FRIC)에서
무료 원문복사 서비스를 제공합니다.
NDSL에서는 해당 원문을 복사서비스하고 있습니다. 위의 원문복사신청 또는 장바구니 담기를 통하여 원문복사서비스 이용이 가능합니다.
- 이 논문과 함께 출판된 논문 + 더보기