Institut Curie Neuroblastoma Whole Genome Sequencing Diagnosis Relapse

Neuroblastoma, a clinically heterogeneous pediatric cancer, is characterized by distinct genomic profiles but few recurrent mutations. As neuroblastoma is expected to have high degree of genetic heterogeneity, study of neuroblastoma's clonal evolution with deep coverage whole-genome sequencing of diagnosis and relapse samples will lead to a better understanding of the molecular events associated with relapse. Samples were included in this study if sufficient DNA from constitutional, diagnosis and relapse tumors was available for WGS. Whole genome sequencing was performed on trios (constitutional, diagnose and relapse DNA) from eight patients using Illumina Hi-seq2500 leading to paired-ends (PE) 90x90 for 6 of them and 100x100 for two. Expected coverage for sample NB0175 100x100bp was 30X for tumor and constitutional samples. For the seven other patients expected coverage was 80X for tumor samples with PE 100x100, 100X in the other tumor samples and 50X for all constitutional samples (see table 1). Following alignment with BWA (Li et al., Oxford J, 2009 Jul) allowing up to 4% of mismatches, bam files were cleaned up according to the Genome Analysis Toolkit (GATK) recommendations (Van der Auwera et al., Current Protocols in Bioinformatics, 2013, picard-1.45, GenomeAnalysisTK-2.2-16). Variant calling was performed in parallel using 3 variant callers: GenomeAnalysisTK-2.2-16, Samtools-0.1.18 and MuTect-1.1.4 (McKenna et al., Genome Res, 2010; Li et al., Oxford J, 2009 Aug; Cibulskis et al., Nature, 2013). Annovar-v2012-10-23 with cosmic-v64 and dbsnp-v137 were used for the annotation and RefSeq for the structural annotation. For GATK and Samtools, single nucleotide variants (SNVs) with a quality under 30, a depth of coverage under 6 or with less than 2 reads supporting the variant were filter out. MuTect with parameters following GATK and Samtools thresholds have been used to filter our irrelevant variants. .SNVs within and around exons of coding genes overlapping splice sites.. Then,variants reported in more than 1% of the population in the 1000 genomes (1000gAprl_2012) or Exome Sequencing Project (ESP6500) have been discarded in order to filter polymorphisms. Finally, synonymous variants were filtered out. MuTect focuses on somatic by filtering with constitutional sample. Mpileup comparison between constitutional and somatic DNAs allowed us to focus also on tumor specific SNVs with GATK and Samtools. Finally, every SNV called by our pipeline and also supported in any constitutional samples were filtered our in order to prevent putative constitutional DNA coverage deficiency. Then we analyzed CNVs (copy number variants) with HMMcopy-v0.1.1 (Gavin et al., Genome Res, 2012) and control-FREEC-v6.7 (Boeva et al., Bioinformatics 2011) with a respective window of 2000bp and 1000 bp, and auto-correction of normal contamination of tumor samples for Control-FREEC. Finally we explored Structural variants (SVs) including deletions, inversions, tandem duplications and translocations using DELLY-v0.5.5 with standard parameters (Rausch et al., Oxford J, 2012). In tumors, at least 10 supporting reads were required to make a call and 5 supporting reads for the sample NB0175 with a coverage of only 40X (see table 2). To predict SVs in constitutional samples for subsequent somatic filtering, only 2 supporting reads were required in order not to miss one. To identify somatic events, all the SVs in each normal sample were first flanked by 500 bp in both directions and any SVs called in a tumor sample which was in the combined flanked regions of respective normal sample was removed (see graph 1). Deletions with more than 5 genes impacted or larger than 1Mb and inversions or tandem duplications covering more than 4 genes, were removed. We focused on exonic and splicing events for deletions, inversions, and tandem duplications. For translocation, we keep all SVs that occurred in intronic, exonic, 5'UTR, upstream or splicing regions. Bioinformatics detection of variations with Deep sequencing approach Once PE reads merged and adaptors trimmed by SeqPrep with default parameters, merged reads were aligned via the BWA (Li H. and Durbin R. 2009 PMID 19451168) allowing up to 1 differences in the 22-base-long seeds and reporting only unique alignments. Only reads having a mapping quality 20 or more have been further analysed. Variant calling software was not used, since we aimed to predict variations at low frequencies, observed in less than 1% of reads. Such variants require a custom approach. Using DepthOfCoverage functions of the Genome Analysis Toolkit (GATK) v2.13.2 (McKenna A, et al., 2010 Genome Research PMID: 20644199), we focused on high quality coverage of bases A, C, G and T at the targeted variant position. Depth of coverage of each base following a mapping quality higher than 20 and a base quality higher than 10 have been taken into account in order to focus only on high quality data. Aiming to determine the background level of variability at the studied regions, 10 control samples were included in the analysis. The same approach and filtering criteria have been applied as introduced above over the entire amplicons. In order to highlight variants, for each sample the frequencies of each bases at each amplicon position were then compared to those observed in the set of controls. Statistical analyses were performed with the R statistical software (http://www.R-project.org). Fisher’s exact two-sided tests with a Bonferroni correction were performed to compare percentages of bases between the data sets, i.e. for a given base between a case and the controls. Finally, significant variations were filtered-in once (i) a significant increase in the percentage of avariant base and (ii) a significant decrease in the percentage of it's reference base following our p.values criteria was observed (p.val < 0.05).

29/08/2017
25 samples
DAC: EGAC00001000319
Technology: Illumina HiSeq 2500

Access Policy1 Study Files

Metadata

Request Access

Institut Curie Neuroblastoma DAC policy

DAC NAME AND ACCESSIONThis document specifies the policy for granting access to the European Genome-phenome Archive (EGA) secure Web based tools for managing data access permissions to the data stored in the EGA database. Access is granted to those Data Access Committee members named in this document by Paul Flicek, head of Genes, Genomes and Variation.Conditions of access:The account holder will abide by all current and future policies of the EBI for computer use as well as computer use policies of the European Molecular Biology Laboratory (EMBL), of which the EBI is a part. All EGA specific policies and standard operating procedures (SOPs) must also be followed. All applicable policies will be provided to the authorized users. Method of access:The EBI will create a personal account for the authorized person. The account username and password together with a RSA key provide access to the secure EGA infrastructure. A separate document with detailed guidelines on how to access and use these tools is provided to each account holder. The EGA help-desk provides training when needed.Review:All individuals will access to the EGA user authorization tools will be reviewed annually by Paul Flicek. Any staff changes affecting the authorized person must be provided immediately to Paul Flicek (flicek@ebi.ac.uk).Authorized Data Access Committee user:Gudrun SchleiermacherDépartement d'Oncologie Pédiatrique et INSERM U830 Institut Curie26 rue d'Ulm75248 Paris Cedex 05Francegudrun.schleiermacher@curie.frOlivier DelattreInstitut Curie – Centre de recherche – Unité U830 Inserm26 rue d'Ulm – 75248 Paris Cedex 05 Franceolivier.delattre@curie.frSupervisor of Authorized user:Olivier DelattreDirector of the Unit 830 Insermolivier.delattre@curie.fr

Studies are experimental investigations of a particular phenomenon, e.g., case-control studies on a particular trait or cancer research projects reporting matching cancer normal genomes from patients.

Study ID	Study Title	Study Type
EGAS00001001184	Institut Curie Neuroblastoma Whole Genome Sequencing Diagnosis Relapse	Other

This table displays only public information pertaining to the files in the dataset. If you wish to access this dataset, please submit a request. If you already have access to these data files, please consult the download documentation.

ID	File Type	Size
EGAF00000808669	bam	167.9 GB
EGAF00000808670	bam	313.2 GB
EGAF00000808671	bam	157.6 GB
EGAF00000808672	bam	365.0 GB
EGAF00000808673	bam	326.8 GB
EGAF00000808674	bam	318.1 GB
EGAF00000808675	bam	315.9 GB
EGAF00000808676	bam	321.1 GB
EGAF00000808677	bam	192.9 GB
EGAF00000808678	bam	322.7 GB
EGAF00000808679	bam	242.3 GB
EGAF00000808680	bam	382.2 GB
EGAF00000808681	bam	324.5 GB
EGAF00000808682	bam	211.4 GB
EGAF00000808683	bam	346.9 GB
EGAF00000808684	bam	402.8 GB
EGAF00000808685	bam	154.0 GB
EGAF00000808686	bam	333.0 GB
EGAF00000808687	bam	166.6 GB
EGAF00000808688	bam	368.4 GB
EGAF00000808689	bam	166.6 GB
EGAF00000808690	bam	387.6 GB
EGAF00000808691	bam	214.8 GB
EGAF00000808692	bam	163.6 GB
EGAF00000808693	bam	249.3 GB
25 Files (6.9 TB)