5%). Average overall G + C content for the eight genes in all 20 strains was ca. 42.5% (Additional file 1), which is slightly higher than the overall G + C content for the entire T. denticola ATCC 35405 genome, which is ca. 37.9% [18]. Table 4 Summary of G + C content (%), number of polymorphic sites, nucleotide diversity per site, global rate ratios and the number of negatively selected codon sites for each gene selected for MLSA Gene No. of nucleotide sites G + C (%) No. (%)of polymorphic sites Nucleotide diversity(Pi) Global rate ω(95%CI) No. of negatively selected sites flaA 1050 40.7 ± 0.4 197 (18.8) 0.0308 ± 0.0130 0.106 (0.080-0.132) 3 recA 1245 45.7 ± 0.5
147 (11.8) 0.0333 ± 0.0049 0.088 (0.065-0.111) 37
pyrH 696 41.8 ± 0.4 128 (18.4) 0.0331 ± 0.0125 0.064 (0.043-0.087) 11 ppnK 855 40.9 ± 0.5 85 (9.9) 0.0309 ± 0.0026 0.082 (0.053-0.110) 20 dnaN 1104 32.4 ± 0.2 98 (8.9) 0.0261 ± 0.0023 PU-H71 mw 0.016 (0.006-0.026) ARN-509 clinical trial 25 era 885 42.4 ± 0.4 115 (13.0) 0.0309 ± 0.0044 0.096 (0.068-0.123) 31 radC 678 43.3 ± 0.2 76 (11.2) 0.0275 ± 0.0048 0.032 (0.015-0.050) 19 16S rRNA 1497 52.4 ± 0.1 16 (1.1) 0.0018 ± 0.0005 N/A* N/A* * N/A: not applicable. These analyses are for protein-encoding genes. Multiple sequence alignments were separately constructed for the eight genes, using sequence data from each of the 20 T. denticola strains. The eight respective sets of gene sequences Rigosertib cell line aligned well, and there were only minor inter-strain differences in gene lengths. The number of polymorphic sites differed considerably between the seven protein-encoding genes (see Table 4); being highest in the flaA (18.8%) and pyrH (18.4%) genes, and lowest in the dnaN gene (8.9%). The 16S rRNA (rrsA/B) genes had by far the lowest numbers of polymorphic sites however (1.1%), indicating
a strong conservation of sequence. Phylogenetic analyses of T. denticola strains using individual gene sequence data Using data obtained from the NCBI GenBank, gene homologues from T. vincentii LA-1 (ATCC 35580) and T. pallidum SS14 were also included in our phylogenetic analyses for comparative purposes (see Additional file 2). Homologues of the flaA, recA, pyrH, ppnK, dnaN, era and radC genes are present in T. vincentii LA-1. The flaA, recA, pyrH, ppnK, dnaN and era genes; but not radC, are present in T. pallidum (e.g. subsp. pallidum SS14 strain [39]). We first determined the most appropriate nucleotide substitution models to use; for the analysis of the 8 individual gene datasets, as well as the combined multi-gene datasets from each strain (species). Accordingly, the optimal nucleotide-substitution models were identified using the Akaike Information Criterion (AIC), as described by Bos and Posada [40]. The results are summarized in Additional file 3.