De novo assembly of 150 Danish genomes reveals rich structural complexity
Most known genetic variation in human genomes has been called from comparison of short reads to the reference genome, an approach biased against finding complex variation. We sequenced 150 individuals from 50 parent-offspring trios with multiple insert-size libraries to very high coverage. We show that each genome could be independently de novo assembled into a small number of high-quality scaffolds (median N50 > 21 Mb), each of quality comparable to long read assemblies while being very cost-effective. We show that our variant call set from comparing de novo assemblies is far more complete in terms of complex variation than previous studies. Importantly, even the complex 4-5 Mb extended MHC region was assembled and resolved into haplotypes, revealing >700kb novel sequence in this important region of the genome, and major parts of the Y chromosome including some palindromes were assembled with high accuracy. Finally, we show that our variant call-set allows for the genotyping of many more complex variants when used as a reference-panel for imputation into SNP-chip data or into previously resequenced genomes.
- Type: Other
- Archiver: European Genome-Phenome Archive (EGA)
Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data
Dataset ID | Description | Technology | Samples |
---|---|---|---|
EGAD00001003157 | 150 | ||
EGAD00001003186 | 68 | ||
EGAD00001003188 | 150 | ||
EGAD00001003454 | AB 3730xL Genetic Analyzer | 8 | |
EGAD00001003455 | 25 |
Publications | Citations |
---|---|
Sequencing and de novo assembly of 150 genomes from Denmark as a population reference.
Nature 548: 2017 87-91 |
71 |
Assembly and analysis of 100 full MHC haplotypes from the Danish population.
Genome Res 27: 2017 1597-1607 |
14 |
Analysis of 62 hybrid assembled human Y chromosomes exposes rapid structural changes and high rates of gene conversion.
PLoS Genet 13: 2017 e1006834 |
22 |
Benchmarking the HLA typing performance of Polysolver and Optitype in 50 Danish parental trios.
BMC Bioinformatics 19: 2018 239 |
21 |