Source: M. McClelland, Sidney Kimmel Cancer Center, 10835 Altman Row, San Diego CA 92121, USA. Reference: Wong et al, FEMS Microbiology Letters 173: 411-423, 1999. The set contains 416 clones, each with an average insert size of 17 kb of Salmonella DNA. The clones have been ordered by sequencing the Salmonella DNA at the ends, and then by arranging them based on their homology to DNA on the complete physical map of E. coli K-12. In addition, a fully ordered set of clones representing the area between min 91 and 96 on the Salmonella typhimurium chromosome is present (described in Wong et al, J. Bacteriol. 176: 5729-5734, 1994).
The physical map of E. coli K-12 is shown at the National Center for Biotechnology Information (NCBI) web site.
Select "Escherichia coli K-12" from the list of proteobacteria. Next, select the "Protein coding genes" to show the location in bps of each gene in E. coli. The location and span of each clone in Kit 8D is shown using the same locations in bps.
1 Present Address: Molecular Biosciences, Pacific Northwest National Laboratory, Richland, Washington 99352, USA.
Sidney Kimmel Cancer Center
10835 Altman Row
San Diego, CA 92121, USA
Phone: 1 619 450 5990 x 280
FAX: 1 619 550 3998
As part of the ongoing sequencing of the complete Salmonella typhimurium LT2 genome, a partly ordered set of 416 lambda clones has been developed, representing over 90% of the genome. The average insert size is 17 kb. Sequences were obtained from both ends of each clone in this set. A total of over 600 kb of sequence has been deposited in the genome survey sequence section of GenBank. This resource of clones is available from the Salmonella Genome Stock Center. A preliminary comparison with the E. coli K12 genome and indicates that there are likely to be many hundred insertion deletion events, encompassing more than one gene, that distinguish these genomes. Fully 30% of the S. typhimurium sequences have no close homologues in the GenBank database.
Key Words: Sample sequencing, Salmonella, bacteriophage lambda library, comparative genomics.
As part of a project to complete the genomic sequence of Salmonella typhimurium LT2 we have constructed a bacteriophage lambda library. Sequencing the ends of a sample of these lambda clones ensures the correct melding of the genomic sequence by confirming linkage over many kilobases while also contributing information towards complete sequence of the genome. This library is also a resource for closing gaps in the sequence.
The M13 clones used in the sequencing project will not be maintained after the end of the project. However, the lambda clones will be maintained as a permanent resource of clones from the genome. This manuscript describes the lambda resource, which is being made available prior to the completion of the sequencing project.
Materials and Methods
Genomic DNA from Salmonella typhimurium LT2 strain AZ1516 was partially digested with Sau3A and the 15 kb-20 kb size class was cloned in a lambda DASHII vector. A total of over 2000 clones were examined for overlap by previously described restriction mapping methods or by deriving radiolabeled riboprobes from one end of an insert in a clone and hybridizing this probe to an array of the clones . The preparation of the library and the methods used to order clones are described in more detail elsewhere .
Phage were prepared using standard procedures . DNA was purified from each of these bacteriophage using the Bio101 quick spin kit (www.bio101.com, La Jolla, CA). Five micrograms of each DNA was sequenced from both ends using a Li-Cor sequencer (www.licor.com/ bio/, Lincoln, NE) and two vector primers, bearing different infrared fluors, located in the T3 and T7 promotors flanking the cloning site. This strategy allowed the sequences from both ends of the clones to be obtained from four lanes on a sequencing gel (www.licor.com/bio/ Posters/ GenSeq97/ GSAabs.htm).
Results and Discussion
A total of 416 clones that had minimal or no overlap with each other were selected for sequencing. These clones are estimated to represent well over 90% of the Salmonella genome, after taking overlap into account. An average of about 900 bases of readable sequence was obtained from each successful sequencing reaction. Approximately 600 kb of sequence from 836 reads have been deposited in the genome survey sequence division of the GenBank database (www4.ncbi.nlm.nih.gov/ dbGSS/ index.html) with accession numbers AF003831-AF003833, AF029406 -AF036003, AF075756 -AF076018 and AF120033-AF120089.
The sequence data is also part of the sequencing project web site (ftp://genome.wustl.edu/pub/gsc1/sequence/st.louis/bacterial/salmonella/B_TR7095/Lambda-DASHII/). The latter web site contains a Blast server at http://genome.wustl.edu/gsc/bacterial/bacterial_blast_server.html. This server searches the sequences presented here and the melded M13 sequences from the ongoing sequencing project, currently amounting to over 3 Mb of sequence.
Each sequence from the lambda clones was compared to the complete E. coli K12 genome  using BlastN  (www.ncbi.nlm.nih.gov/ BLAST/). Homologous regions between Salmonella and E. coli are generally about 85% identical at the nucleotide level . Thus, a probability threshold of P < e-50 in BlastN was chosen as the definition of putative orthologues because this generally indicated a more than 80% match spanning at least 400 bases. These data are summarized in Table 1.
High homology with the E. coli K12 genome was seen for both ends of a clone in 222 cases. Among the clones that matched E. coli at both ends we determined the insert size in 106 cases. Forty of these 106 insert sizes differed in size by more than 4000 bases when compared to the corresponding apparently orthologous region in E. coli, indicating there may be a relatively large net insertion/deletion event in these clones (marked in bold in Column E of Table 1). Nine more clones matched the E. coli K12 genome at both ends but at very widely divergent positions in the E. coli genome. These clones are marked with a "*" in column C of Table 1. Some of these clones may represent true rearrangements between the Salmonella and E. coli genomes, whereas others may indicate paralogous comparisons with sequences that are not adjacent in the E. coli genome. 129 clones matched E. coli K12 only at one end; 65 clones matched E. coli at neither end.
158 of the 836 sequences were highly homologous or identical to sequences from various Salmonella strains already in the GenBank database (P < e-50 in BlastN), reflecting the amount of sequence already available from Salmonella genomes (Marked in bold italics in column F and G of Table 1). The 836 S. typhimurium sample sequences were also compared to the rest of the GenBank database and a few sequences shared their best homology with sequences other than E. coli K12 or Salmonella. Homologies with a significance of P < e-9 are indicated in Table 1. Further details of the genes involved are presented in Table 2. In many of these cases there is a close match with E. coli K12 at one end and a close match with a different genome at the other end of the clone. There are cases where bacteriophage or plasmid sequences are the best homologues in the database for one end of a clone. It is possible that these sequences are from previously unknown extrachromosomal phage or plasmids. They are more likely to be from genes that are integrated in the genome of LT2 (such as the FELS prophage [6,7]) but are related to genes found on phage or plasmids in other bacteria.
In 836 sequence reads from around the S. typhimurium genome we detected 259 sequence reads that were not homologous to E. coli K12. This represents about 30% of the sequences. Thus, based on a genome size of about 5 Mb it is estimated that there may be 1.5 Mb of non-homologous sequences present in S. typhimurium and absent in the E. coli K12 genome. In each case, such genes may have been introduced into Salmonella after divergence from the common ancestor with E. coli, or these genes may have been deleted in the E. coli lineage.
The large number of S. typhimurium sequences that showed little or no homology with the E. coli K12 genome indicate that these two genomes are rather more different than might be suggested by the considerable concordance in their genetic maps . DNA-DNA hybridization studies estimated the amount of non-homologous sequence to be 30 to 40% of these genomes [8-10], which may more accurately reflect the number of regions in these genomes that do not share homology. The proportion of non-homologous sequences observed in the sample we present here (30%) is similar to these DNA-DNA hybridization estimates and is also similar to the proportion of non-homologous sequences we obtained when we compared sample sequences from S. typhi with the complete E. coli K12 genome (38%) . The difference between the 30% and 38% divergence estimates may be attributed to the different length of the sequence reads in the two studies and the different threshold in BlastN used for scoring a homologue that the difference in sequence length required.
The number of insertion/deletion events that distinguish the S. typhimurium and E. coli genomes must be very high. We noted that 40 clones (38%) of the 106 clones of known size that matched E. coli at both ends showed insertion/deletion events of over 4000 bases [Table 1, column (E)]. These clones represent about 1.8 Mb of the genome (106 x 17 kb), so by extrapolation perhaps there are well over 100 insertion/deletion events of over 4000 base pairs (40 x 5 Mb / 1.8 Mb = 111). This latter estimate is similar to the estimate we obtained for the S. typhi versus E. coli genome, which was determined using a very different approach: the rate of detection of putative junctions between homologous and unique DNA in a set of sample sequences .
The sequences we report in this paper and the associated lambda clones have already proved useful as a source of DNA for complementation studies  and are a vital component for completion of the Salmonella typhimurium LT2 genome sequence (http:// genome.wustl.edu/ gsc/ bacterial/ salmonella.html). We previously published an additional set of restriction mapped clones covering the region from about 4,250,000 to about 4,500,000 in E. coli . End sequences from some of these clones have been perfromed and are included in Table 1. The resource of over 2000 lambda clones is deposited at the Salmonella stock center (www.ucalgary.ca/ ~kesander/ intro.html).
Acknowledgments: This work was supported by grants from the United States National Institute of Allergy and Infectious Diseases grants AI-34829 and AI-43283. We thank Ken Sanderson, Rick Wilson, and the bioinformatics staff at the Genome Sequencing Center of Washington University St. Louis for many helpful discussions and for maintaining the web sites.
1] Wong,K.K., Wong,R.M., Rudd,K.E. and McClelland,M. (1994) High-resolution restriction map for a 240-kilobase region spanning 91 to 96 minutes on the Salmonella typhimurium LT2 chromosome. J. Bacteriol., 176, 5729-5734.
2] Sambrook,J., Fritsch,E.F. and Maniatis,T. (1998) Molecular Cloning, A Laboratory Manual. Cold Spring Harbor Lab., Cold Spring Harbor.
3] Blattner,F.R., Plunkett,G., Bloch,C.A., Perna,N.T., Burland,V., Riley,M., Collado-Vides,J., Glasner,J.D., Rode,C.K., Mayhew,G.F., Gregor,J., Davis,N.W., Kirkpatrick,H.A., Goeden,M.A., Rose,D.J., Mau,B. and Shao,Y. (1997) The complete genome sequence of Escherichia coli K-12. Science, 277, 1453-1474.
4] Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389-3402.
5] Sharp,P.M. (1991) Determinants of DNA sequence divergence between Escherichia coli and Salmonella typhimurium: codon usage, map position, and concerted evolution. J. Mol. Evol., 33, 23-33.
6] Affolter,M., Parent-Vaugeois,C. and Anderson,A. (1983) Curing and induction of the Fels 1 and Fels 2 prophages in the Ames mutagen tester strains of Salmonella typhimurium. Mutat. Res., 110, 243-262.
7] Wong,K.K. and McClelland,M. (1992) A BlnI restriction map of the Salmonella typhimurium LT2 genome. J. Bacteriol., 174, 1656-1661.
8] Riley,M. and Sanderson,K.E. (1990) Comparative genetics of Escherichia coli and Salmonella typhimurium. In: The Bacterial Chromosome (Drlica,K. and Riley,M., Eds.), pp. 85-95. American Society of Microbiology, Washington, D.C..
9] Krawiec,S. and Riley,M. (1990) Organization of the bacterial chromosome. Microbiol. Rev., 54, 502-539.
10] Brenner,D.J. (1984) Enterobacteriacea. In: Bergey's manual of systematic bacteriology (Krieg,N.R. and Holt,J.G., Eds.), pp. 408-420. Williams & Wilkins, Baltimore.
11] McClelland,M. and Wilson,R.K. (1998) Comparison of sample sequences of the Salmonella typhi genome to the sequence of the complete Escherichia coli K-12 genome. Infect. Immun., 66, 4305-4312.
12] Wong,K.K., McClelland,M., Stillwell,L.C., Sisk,E.C., Thurston,S.J. and Saffer,J.D. (1998) Identification and sequence analysis of a 27-kilobase chromosomal fragment containing a Salmonella pathogenicity island located at 92 minutes on the chromosome map of Salmonella enterica serovar typhimurium LT2. Infect. Immun., 66, 3365-3371.