Need Help?

GoNL aligned sequence data in BAM format.

We mapped the data to the UCSC human reference genome build 37 using BWA 0.5.9-r16. We first mapped each read pair separately using bwa aln. Then we used bwa sampe to map the paired reads together to a BAM9 file. The BAM file was then sorted by genomic position and indexed using PicardTools-1.32 SortSam. To prevent PCR artifacts from influencing the downstream analysis of our data, we used Picard to mark the duplicate reads, which were ignored in downstream analysis. We used GATK IndelRealigner on our data around known indels (from 1KG Pilot). The IndelRealigner creates all possible read alignments using the source and computes the likelihood of the data containing the indel based on the read pileup. Whenever the maximum likelihood contains an indel, the reads are realigned accordingly. Each base is associated with a phred-scaled base quality score. Calibration of Phred scores is crucial as they are used in some of the downstream analysis models. We used GATK to recalibrate the base qualities with respect to (i) the base cycle, (ii) original quality score, and (iii) dinucleotide context. To minimize issues stemming from mapping problems around indels, we decided to undergo a second round of indel realignment using the GATK IndelRealigner by family rather than by individual. For this second round, we considered two sources of possible indels: 1KG Phase 1 indels and indels aligned by BWA in the GoNL data.

Request Access

The Genome of the Netherlands Project Data Access Policy

Studies are experimental investigations of a particular phenomenon, e.g., case-control studies on a particular trait or cancer research projects reporting matching cancer normal genomes from patients.

Study ID Study Title Study Type
Other

This table displays only public information pertaining to the files in the dataset. If you wish to access this dataset, please submit a request. If you already have access to these data files, please consult the download documentation.

ID File Type Size Located in
EGAF00000733256 bam 58.5 GB
EGAF00000733258 bam 61.3 GB
EGAF00000733259 bam 59.0 GB
EGAF00000733263 bam 51.2 GB
EGAF00000733264 bam 59.4 GB
EGAF00000733265 bam 59.5 GB
EGAF00000733266 bam 69.2 GB
EGAF00000733267 bam 64.6 GB
EGAF00000733268 bam 74.7 GB
EGAF00000733269 bam 55.9 GB
EGAF00000733270 bam 72.7 GB
EGAF00000733271 bam 67.3 GB
EGAF00000733272 bam 71.0 GB
EGAF00000733273 bam 68.6 GB
EGAF00000733274 bam 68.3 GB
EGAF00000733275 bam 71.6 GB
EGAF00000733276 bam 68.1 GB
EGAF00000733277 bam 69.5 GB
EGAF00000733279 bam 64.6 GB
EGAF00000733280 bam 59.9 GB
EGAF00000733281 bam 60.2 GB
EGAF00000733282 bam 62.5 GB
EGAF00000733283 bam 62.3 GB
EGAF00000733284 bam 65.1 GB
EGAF00000733285 bam 60.2 GB
EGAF00000733286 bam 54.7 GB
EGAF00000733287 bam 60.1 GB
EGAF00000733289 bam 58.2 GB
28 Files (1.8 TB)