(Note that the Greengenes database uses the INSDC (= EMBL/NCBI/DDBJ) annotation, which is not an authoritative source for nomenclature or classification.) The highest-scoring environmental sequence was EU735617 (Greengenes short name: 'archaeal structures and pristine soils China oil contaminated soil Jidong Oilfield clone SC78'), which showed an identity of 99.0% and an HSP coverage of 98.4%.The most frequently occurring keywords within the labels of all environmental samples which yielded hits were 'librari' (3.2%), 'dure' (3.0%), 'bioremedi, broader, chromat, groundwat, microarrai, polylact, sampl, stimul, subsurfac, typic, univers' (2.9%), 'spring' (2.5%) and 'soil' (2.4%) (156 hits in total).
The branches are scaled in terms of the expected number of substitutions per site.
Numbers adjacent to the branches are support values from 1,000 ML bootstrap replicates  (left) and from 1,000 maximum-parsimony bootstrap replicates  (right) if larger than 60%.
Illumina GAii sequencing data (1,096.5Mb) was assembled with Velvet  and the consensus sequences were shredded into 1.5 kb overlapped fake reads and assembled together with the 454 data.
The 454 draft assembly was based on 178.7 Mb 454 draft data and all of the 454 paired end data.
Newbler parameters are -consed -a 50 -l 350 -g -m -ml 20.