For example, one particular Salmonella genome exhibits an increased coding density, ratio of short proteins, and number of hypothetical proteins along with a decreased average protein length (Salmonella enterica subsp. enterica serovar Paratyphi B str. SPB7). In other cases subclusters of a particular species are formed due to potential erroneous annotations such as the three Yersinia pestis genomes that cluster separately from other Y. pestis strains due to skews in annotation that were derived from the same pipeline [72]. In other cases, substrains do not cluster together as the annotations were derived from three different annotation pipelines such as the case for E. coli BL21 where three isolates were sequenced and annotated by three different research groups [73]. Evolutionary events that result in altered annotations in a particular organism are significant and aid our understanding of the biology of not only that particular organism but of related organisms. Annotation differences due to the utilization of different methods and sources skew these results and the conclusions that result from them. Researchers are encouraged to update their annotations on archival records to meet the minimal standards and to correct any annotation discrepancies. Systems are being developed at NCBI to check newly submitted genomes for compliance with minimal standards and reports will be provided to submitters for quality assurance. Genomic records where the minimal standards cannot be met for real biological reasons will have explanatory comments added to the record. Pseudogene Identification, Nomenclature, and Annotation Pseudogene definitions take a variety of forms and the difficulties in properly defining and labeling pseudogenes stem from the same problem: a negative cannot be experimentally verified [74]. In eukaryotes, pseudogenes are defined as non-functional copies of gene fragments due to retrotransposition or genomic duplication, while in prokaryotes they result from degradation processes of either single copy or multiple copy genes either after duplication or failed horizontal transfer events [74,75]. A recent analysis of pseudogenes in Salmonella genomes suggests that they are cleared relatively rapidly from a genome indicating that their presence is a recent evolutionary event [76]. Although a clear definition of pseudogenes was not put forth, it was stressed that INSDC expects that all genome annotation should reflect the biology as determined by the underlying sequence. The INSDC feature table format provides several exceptions for cases of unusual biology but there are consequences for these unusual annotations that serve as flags in genome records (Table 3). A proposal was made to alter the pseudogene qualifier “/pseudo” to both”/pseudogene” and “/nonfunctional” as /pseudo is not considered to equate 100% to /pseudogene and that request is still being discussed by INSDC.

