Comparison of structural variations from 10X Genomics linked-reads and conventional Illumina short-reads sequencing
Structural variations (SVs) are large genomic rearrangements that can drive many diseases. Conventional short-reads whole genome sequencing (cWGS) allows their identification with base-pair resolution, but suffers from high false discovery rate. cWGS taps in short-range information from short-reads while linked-reads sequencing (10XWGS) utilizes long-range information. 10XWGS allows linkage of short-reads originating from the same large DNA molecule with a unique barcode captured in a gel bead in emulsion. This mitigates alignment-based artefacts from cWGS especially in repetitive regions. However, the false discovery rate of this technology is unclear. In this study, we performed a comprehensive analysis of different type and size of SVs predicted from these two technologies. The SVs common between both technologies were found to be highly specific by PCR and Sanger sequencing while validation rate dropped for uncommon events. Further, we propose a novel enrichment approach for filtering out false positive calls from both the technologies independently. To this end, we trained a machine learning model for respective technologies and used it to characterise SVs from MCF7 cell line and a primary breast cancer tumor with high precision. This approach would be valuable in understanding true mechanisms driven by SVs in various diseases.
- Type: Other
- Archiver: European Genome-Phenome Archive (EGA)
Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data
Dataset ID | Description | Technology | Samples |
---|---|---|---|
EGAD00001005724 | Illumina NovaSeq 6000 | 4 |
Publications | Citations |
---|---|
Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions.
PLoS Comput Biol 16: 2020 e1008397 |
5 |