Need Help?

Targeted de novo phasing and long-range assembly by template mutagenesis

Long-range sequencing with low error rate has been challenging. Sequence assembly and phasing usually require a high-quality reference genome for mapping, so working on highly-variable genomic regions or regions with no reference genome information would be difficult. In this study, we describe novel bench protocols and algorithms to obtain ultra-low-error-rate haplotype-phased sequence assemblies of regions 10 KB in length using a short-read sequencing platform that simultaneously solves the above two problems. We accomplish this by imprinting each template strand from a target region with a dense and unique mutation pattern. The mutation process randomly and independently converts ~50% of cytosines to uracils. Short-read sequencing libraries are made from both mutated and unmutated templates. A conservative de Bruijn graph approach seeds an assembly of the mutated templates, which we then extend by mapping paired-end reads. We next partition the template assemblies into two or more haplotypes after using the unmutated sequence library to recover almost all of the mutated bases. The final haplotype is assembled and corrected for residual template mutations and PCR errors. We obtain per-base-error rates below 10 9. We apply this method to a human family, correctly assembling and phasing three genomic intervals, including the highly polymorphic HLA-B gene.

Request Access

To gain access to this dataset, please provide details of your organization and project.

To gain access to this dataset, please provide details of your organization and project.

Studies are experimental investigations of a particular phenomenon, e.g., case-control studies on a particular trait or cancer research projects reporting matching cancer normal genomes from patients.

Study ID Study Title Study Type
Population Genomics

This table displays only public information pertaining to the files in the dataset. If you wish to access this dataset, please submit a request. If you already have access to these data files, please consult the download documentation.

ID File Type Size Located in
EGAF00005745448 1640016242526 30.3 MB
EGAF00005745449 1640016182317 31.6 MB
EGAF00005745450 1640016182317 35.9 MB
EGAF00005745451 1640016182317 37.1 MB
EGAF00005745452 1640016182317 78.0 MB
EGAF00005745453 1640016182317 83.7 MB
EGAF00005745454 1640016301437 148.4 MB
EGAF00005745455 1640016182317 155.6 MB
EGAF00005745456 1640016242526 257.7 MB
EGAF00005745457 1640016242526 269.9 MB
EGAF00005745458 1640016301437 698.3 MB
EGAF00005745459 1640016420304 740.7 MB
EGAF00005745460 1640016182317 40.8 MB
EGAF00005745461 1640016182317 42.7 MB
EGAF00005745462 1640016242526 282.2 MB
EGAF00005745463 1640016242526 295.8 MB
EGAF00005745464 1640016182317 51.9 MB
EGAF00005745465 1640016182317 54.0 MB
EGAF00005745466 1640016182317 73.9 MB
EGAF00005745467 1640016182317 80.1 MB
EGAF00005745468 1640016242526 212.1 MB
EGAF00005745469 1640016301437 228.6 MB
EGAF00005745470 1640016182317 138.6 MB
EGAF00005745471 1640016182317 145.1 MB
EGAF00005745472 1640016242526 56.7 MB
EGAF00005745473 1640016242526 58.9 MB
EGAF00005745474 1640016242526 45.4 MB
EGAF00005745475 1640016182317 47.0 MB
EGAF00005745476 1640016301437 112.1 MB
EGAF00005745477 1640016182317 124.5 MB
EGAF00005745478 1640016242526 83.5 MB
EGAF00005745479 1640016242526 92.0 MB
EGAF00005745480 1640016182317 126.5 MB
EGAF00005745481 1640016242526 136.9 MB
EGAF00005745482 1640016242526 53.8 MB
EGAF00005745483 1640016182317 56.1 MB
EGAF00005745484 1640016182317 49.7 MB
EGAF00005745485 1640016182317 51.5 MB
EGAF00005745486 1640016242526 212.5 MB
EGAF00005745487 1640016301437 223.4 MB
EGAF00005745488 1640016301437 464.2 MB
EGAF00005745489 1640016360325 480.2 MB
EGAF00005745490 1640016182317 171.1 MB
EGAF00005745491 1640016301437 182.7 MB
EGAF00005745492 1640016182317 84.5 MB
EGAF00005745493 1640016182317 93.3 MB
EGAF00005745494 1640016242526 80.3 MB
EGAF00005745495 1640016242526 88.1 MB
48 Files (7.4 GB)