Click on a Dataset ID in the table below to learn more, and to find
out who to contact about access to these data
Dataset ID
Description
Technology
Samples
EGAD00001003338
This is a test dataset derived from public data of the 1000 Genomes Project. Its purpose is not to allow for any inference about cohort data or results, but to aid bioinformaticians in the technical development and testing of tools, as well as data consumers in learning how to access information.
This dataset consists of 2508 samples from the 1000 Genomes Project (https://www.nature.com/articles/nature15393). Samples' (e.g. NA18534) data can be accessed through the IGSR portal (e.g. https://www.internationalgenome.org/data-portal/sample/NA18534) or their corresponding folder at the 1000 Genomes' FTP site (e.g. http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/data/CHB/NA18534/exome_alignment/).
There are several different types of data this dataset encompasses: Variant Calling Format (VCF, or its binary counterparts BCF) files, both joint (e.g. ALL_chr22_20130502_2504Individuals.vcf.gz) and split (HG01775.chrY.vcf.gz); exome sequencing CRAM files (e.g. NA18534.GRCh38DH.exome.cram); whole genome sequencing CRAM/BAM files (e.g. NA19239.cram). Additionally, there are multiple files that were sliced to create shorter files, which allows for a quick download, formated as "{FILE-INFO}__{NUMBER-OF-READS}r__{CHR}.{START-COORDINATE}-{END-COORDINATE}.{FILETYPE}" (e.g. "HG01500.GRCh38DH__90r__3.10000-10500__4.10000-10500.cram"). These files can be downloaded directly through the EGA-download-client PyEGA3 (https://github.com/EGA-archive/ega-download-client).
AB SOLiD 4 System
unspecified
6
EGAD00001009826
This is a test dataset derived from public data of the 1000 Genomes Project. Its purpose is not to allow for any inference about cohort data or results, but to aid bioinformaticians in the technical development and testing of tools, as well as data consumers in learning how to access information.
This dataset consists of 3 pairs of light-weight (sliced) files: BAM + BAI, CRAM + CRAI and VCF + TBI. These files can be downloaded directly through the EGA-download-client PyEGA3 (https://github.com/EGA-archive/ega-download-client).
For any further questions, please contact the DAC (Helpdesk - email: helpdesk [at] ega-archive [dot] org).
unspecified
1