Self-organizing Approach for the Human Gut Meta-genome

Jianfeng Zhu1, Songgang Li1, Wei-Mou Zheng*, 1, 2
1 Beijing Genomics Institute, Shenzhen (BGI-SZ), Shenzhen 518083, China
2 Institute of Theoretical Physics, Academia Sinica, Beijing 100190, China

© 2012 Zhu et al.

open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

* Address correspondence to this author at the Beijing Genomics Institute, Shenzhen (BGI-SZ), Shenzhen 518083, China; Tel: 86-10-62541820; Fax: 86-10-62562587; E-mail:


We extend the self-organizing approach for annotation of a bacterial genome to analyzing the raw sequencing data of the human gut metagenome without sequence assembling. The original approach divides the genomic sequence of a bacterium into non-overlapping segments of equal length and assigns to each segment one of seven ‘phases’, among which one is for the noncoding regions, three for the direct coding regions to indicate the three possible codon positions of the segment starting site, and three for the reverse coding regions. The noncoding phase and the six coding phases are described by two frequency tables of the 64 triplet types or ‘codon usages’. A set of codon usages can be used to update the phase assignment and vice versa. After an initialization of phase assignment or codon usage tables, an iteration leads to a convergent phase assignment to give an annotation of the genome. In the extension of the approach to a metagenome, we consider a mixture model of a number of categories of genomes. The Illumina Genome Analyzer sequencing data of the total DNA from faecal samples are then examined to understand the diversity of the human gut microbiome.

Keywords: Human gut meta-genome, codon usages, self-organizing genome annotation.