Short and long-read genome sequencing methodologies for somatic variant detection; genomic analysis of a patient with diffuse large B-cell lymphoma
Recent advances in throughput and accuracy mean that the Oxford Nanopore Technologies (ONT) PromethION platform is a now a viable solution for WGS. New bioinformatic methods have been developed to take advantage of this long read data, however much of the validation of these tools has focussed on calling germline variants (both SNVs and structural variants). Somatic variants are outnumbered many-fold by germline variants and their detection is further complicated due to their frequency varying depending on tumour purity/subclonality. Here, we evaluate the extent to which Nanopore WGS enables genome-wide detection and analysis of somatic variation. We do this through sequencing tumour and germline genomes for a patient with diffuse B-cell lymphoma. We examine the capability of currently available tools for calling somatic variants in ONT data by comparing the data with results from 150bp short-read sequencing of the same samples. We then conduct a detailed analysis of the performance of multiple long-read mappers and structural variant callers for calling large, somatic structural variants (SVs) in ONT data. Our protocol achieved yields of up to 96 mapped Gb per PromethION flow cell with average read lengths of ~5kb. Calling germline SNVs from these data achieved good specificity and sensitivity. However, results of somatic SNV calling highlight the need for the development of specialized joint calling algorithms. Our analysis of structural variants shows that the comparative performance of different tools varies significantly between SV types, and suggest long reads are especially advantageous for calling large somatic deletions and duplications. Finally, we highlight the utility of long reads for phasing clinically relevant variants by using the ONT data to confirm that a somatic 1.6Mb deletion and a p.(Arg249Met) mutation involving TP53 are oriented in trans.
- Type: Other
- Archiver: European Genome-Phenome Archive (EGA)
Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data
Dataset ID | Description | Technology | Samples |
---|---|---|---|
EGAD00001006204 | Illumina HiSeq 1500 MinION PromethION | 5 |
Publications | Citations |
---|---|
Short and long-read genome sequencing methodologies for somatic variant detection; genomic analysis of a patient with diffuse large B-cell lymphoma.
Sci Rep 11: 2021 6408 |
11 |