Login
Register
Need Help?
ABOUT
ABOUT THE EGA
EGA
Privacy Notice
Security
Team
STATISTICS
Bibliography
Growth
Community
Archive
Distribution
Catalog
PROJECTS AND FUNDERS
Projects
Funders
GA4GH
Federated EGA
Beacon
DISCOVERY
CATALOGUE
Studies
Datasets
DACs
Synthetic Data
METADATA
Search Box
Public Metadata API
SUBMISSION
DATA
File preparation
Uploading files
METADATA
EGA Schema
Sequencing & Phenotype
Submitter Portal
Submitter Portal API
Array
Programmatic Submission XML
ACCESS
DATA ACCESS COMMITTEE
What is a DAC?
Best Practices
DAC Portal
Data Use Conditions
REQUEST DATA
How to request data?
Quality Control Reports
DOWNLOAD
Metadata
Files
PyEGA3
Live Outbox
Visualisation
FUSE Client
EGA QuickView
Tips on how to search
DACs
EGAC00001001450
DAC for Levy Group
Request Access
This DAC controls 1 dataset
Dataset ID
Description
Technology
Samples
EGAD00001008444
Long-range sequencing with low error rate has been challenging. Sequence assembly and phasing usually require a high-quality reference genome for mapping, so working on highly-variable genomic regions or regions with no reference genome information would be difficult. In this study, we describe novel bench protocols and algorithms to obtain ultra-low-error-rate haplotype-phased sequence assemblies of regions 10 KB in length using a short-read sequencing platform that simultaneously solves the above two problems. We accomplish this by imprinting each template strand from a target region with a dense and unique mutation pattern. The mutation process randomly and independently converts ~50% of cytosines to uracils. Short-read sequencing libraries are made from both mutated and unmutated templates. A conservative de Bruijn graph approach seeds an assembly of the mutated templates, which we then extend by mapping paired-end reads. We next partition the template assemblies into two or more haplotypes after using the unmutated sequence library to recover almost all of the mutated bases. The final haplotype is assembled and corrected for residual template mutations and PCR errors. We obtain per-base-error rates below 10 9. We apply this method to a human family, correctly assembling and phasing three genomic intervals, including the highly polymorphic HLA-B gene.
Illumina MiSeq
4