Information on the Williams 82 Physical Map
The initial Williams 82 physical map was based on BACs from two libraries. HICF fingerprinting was done by Jan Dvorak and Mingcheng Luo at UC Davis and FPC contigs were assembled by Wes Warren at Washington University. This project was funded
by a grant from The United Soybean Board to Randy Shoemaker (USDA-ARS, Iowa State Univ.). Subsequently a third BAC library was fingerprinted and FPC contigs assembled by Will Nelson, resulting in the Williams 82 physical map shown in SoyBase. This second
phase of map construction was funded by
NSF Grant (0501877) to
Scott Jackson (Purdue Univ.),
Gary Stacey (Univ. Missouri),
Jeffrey Doyle (Cornell Univ.),
William Beavis (NCGR),
Gregory May (NCGR) and
Randy Shoemaker (USDA-ARS, Iowa State Univ.).
The second FPC assembly was performed by
Will Nelson (Univ. Arizona).
Scott Jackson sjackson@purdue.edu(Principal Investigator)
Gary Stacey (Co-Principal Investigator)
Jeffrey Doyle (Co-Principal Investigator)
William Beavis (Co-Principal Investigator)
Gregory May (Co-Principal Investigator)
Randy Shoemaker (Co-Principal Investigator)
BACs were HICF fingerprinted from
3 BAC libraries. After removing fingerprints that did not meet our quality standards, 134182 BACs were analyzed using FPC.
BAC library | number of BACs |
GM_WBa | 25145 |
GM_WBb | 61379 |
GM_WBc | 37658 |
7435 BACs from the cultivar Forrest were also included in the FPC analysis to allow the integration of the Forrest and Williams 82 physical maps. These BACs represent a Minimum Tiling Path developed
by
David Lightfoot et al. (Southern Illinois University).
FPC assembly resulted in 1745 BAC contigs containing 112254 BACS and 29633 singletons.
BAC library | BACs in FPC contig | singleton BACs |
GM_WBa | 23412 | 11733 |
GM_WBb | 53681 | 7698 |
GM_WBc | 30121 | 7537 |
Forrest MTP | 5040 | 2395 |
The genetic and physical maps were aligned by positioning the BAC contigs on the genetic map centered on or spanning their anchoring genetic markers. Therefore the orientation of individual contigs may be reversed around
a marker and the ends of the FPC contig may be slightly mispositioned. See the
Williams 82 sequence map for more precise positioning of the FPC contigs relative
to genetic markers. The sizes of the BAC contigs are approximations as they are based on the number of bands in the HICF digests scaled to match the genetic map. The order of the BACs in a contig is that provided by FPC.
Methods used to improve and extend the FPC assembly.
Well-to-well contamination in the BAC library plates
Potential contamination was assessed by finding all instances where two or more BACs in a contig came from the same plate in a BAC library. 7556 such cases were found in 824 of the 1983 contigs in the physical map. These
fell into three classes:
- 5613 cases where 2 BACs came from the same library plate but were not in adjacent wells in the plate.
- 293 cases where the 2 BACs were in adjacent wells.
- 1650 cases where there were >2 BACs from the same plate in a contig.
We did two tests to help decide if these potentially contaminated BACs should be removed from the FPC assembly.
1. Direct measurement of well-to-well contamination
Single sequence reads were attempted for both ends of each BAC in the two libraries. Within each library every BES was compared to all others using BLASTN (e<1E-199, bit score >300, >=90% of the query sequence
covered by the sequence similarity). Instances where BES from adjacent wells matched at this level were recorded and used to build a contamination map for each plate. Since we assume that these BLAST matches indicate actual contamination, we are removing these BACs from the contigs.
2. Simulations to assess the probability that two BACs from a plate could be in a contig by chance
We simulated 10,000 times a sampling of the BAC libraries to generate contigs with different number of BACs. The actual simulation parameters were chosen to closely match those of the actual contigs. As expected, our results
show that the probability of two unrelated BACs from the same plate being in a single contig is proportional to the number of BACs in the contig. As shown below, even for relatively small contigs the probability of two unrelated
BACs being found in the same contig is substantial. For this reason we have decided for now to leave BACs from the same plate but in non-adjacent wells in the FPC contigs.
# of BACs in contig | % simulation runs w/o 2 BACs from a plate | % simulation runs with 2 BACs from a plate |
2 | 99.67 | 0.33 |
3 | 98.94 | 1.06 |
4 | 98.02 | 1.98 |
5 | 96.42 | 3.58 |
6 | 94.68 | 5.32 |
7 | 92.83 | 7.17 |
8 | 90.9 | 9.1 |
9 | 87.86 | 12.14 |
10 | 85.5 | 14.5 |
12 | 78.36 | 21.64 |
14 | 72.32 | 27.68 |
16 | 64.55 | 35.45 |
18 | 57.86 | 42.14 |
20 | 50.51 | 49.49 |
24 | 36.41 | 63.59 |
28 | 25.06 | 74.94 |
32 | 15.78 | 84.22 |
40 | 5.28 | 94.72 |
48 | 1.4 | 98.6 |
56 | 0.27 | 99.73 |
72 | 0.01 | 99.99 |
88 | 0 | 100 |
Assessing quality of contig assembly using BAC-marker associations
We are using the depth of coverage of markers in contigs to statistically assess the likely correctness of the FPC assembly. To do this we compare the number of times overlapping BACs are hit by a given marker. Although this analysis relies on the unproven assumption that all BACs that could be hit by a marker were identified, it never-the-less gives us a relative confidence measure for comparing different FPC assemblies. This process will be ongoing as more markers are assigned to BACs.
Anchoring BAC contigs to the genetic map
A number of labs are actively working to anchor BAC contigs to the genetic map:
Jackson (Purdue Univ.) |
overgos derived from soybean ESTs, selected genomic sequences and genomic sequence around SSRs |
Shoemaker (USDA-ARS, Iowa State Univ.) |
overgos derived from soybean ESTs and selected genomic sequences; SSRs from Composite Genetic Map and newly identified in BACs assigned to contigs |
Stacey (Univ. of Missouri) |
STSs derived from selected genomic sequences; SSRs from Composite Genetic Map |