Genome-Wide Pleiotropy Scan Across Multiple Cancers
A whole-exome sequencing (WES) study was conducted in 3,233 cases diagnosed with multiple primary cancers and 3,229 matched cancer-free controls (90% non-Hispanic white, 3% African-American, 3% East Asian, and 4% Latino) selected from individuals in the Kaiser Permanente Research Bank (KPRB) who were members of the Kaiser Permanente Northern California (KPNC) health plan. Cancer-free controls were matched to cases on age at specimen collection (within 2 years), sex, genotyping array (which matched on self-reported race/ethnicity), closest distance using the first two principal components for genetic ancestry, and reagent kit. Cases and controls were drawn from two prospective KPRB cohorts: the Research Program on Genes, Environment and Health (RPGEH) and the ProHealth study.
Participants were sequenced by the Regeneron Genetics Center using the Illumina NovaSeq 6000 platform, and sample preparation and quality control were performed using a high-throughput, fully-automated system [PMID: 33087929]. Reads were aligned to the GRCh38 reference genome, and variants were called using WeCall [PMID: 33087929]. Participants with sex discordance, 20x coverage at less than 80% of targeted sites, and/or contamination greater than 5% were excluded. After quality control, we retained n = 6,247 (3,111 cases, 3,136 controls) individuals for downstream analyses. Among participants selected for this WES study, n = 5,432 (2,299 cases; 3,133 controls) consented to deposition of data to the National Institutes of Health (NIH).
Further quality control was applied to filter low quality variants. Genotype calls with low depth of coverage (DP) were updated to missing (DP < 7 for SNPs and DP < 10 for indels), after which sites with low allele balance (AB) - variants without at least one sample having AB ≥ 15% for SNPs or AB ≥ 20% for indels - were removed. Lastly, variants with missingness > 10% and Hardy-Weinberg equilibrium p-value < 10-15 were excluded. Further description of quality control and downstream single-variant and gene-based analyses is available in Cavazos et al, 2022 [medRxiv].
- Type: Case-Control
- Archiver: The database of Genotypes and Phenotypes (dbGaP)