Need Help?

Resolving the Full Spectrum of Human Genome Variation using Linked-Reads

This dataset contains Linked-Read Whole Exome Sequencing (lrWES) from individuals with known disease-causing variants. The dataset comprises of 30 samples from 10 donors, where multiple samples from the same donor reflect experimental differences assaying the effect of input DNA length on coverage and phasing. Raw data (i.e. BAM files) and variant analysis (i.e. VCF files) for each sample are included in this dataset.

Request Access

Sequencing data from individuals with known disease-causing variants described in the publication titled "Resolving the Full Spectrum of Human Genome Variation using Linked-Reads" are contained in this submission. Authors of that publication serve as the data access committee

DATA ACCESS AGREEMENT These terms and conditions govern access to the managed access datasets (details of which are set out in Appendix I) to which the User Institution has requested access. The User Institution agrees to be bound by these terms and conditions. Definitions Authorized Personnel: The individuals at the User Institution to whom 10x Genomics grants access to the Data. This includes the User, the individuals listed in Appendix II and any other individuals for whom the User Institution subsequently requests access to the Data. Details of the initial Authorised Personnel are set out in Appendix II. Data: The managed access datasets to which the User Institution has requested access. Data Producers: 10x Genomics and the collaborators listed in Appendix I responsible for the development, organization, and oversight of these Data. External Collaborator: A collaborator of the User, working for an institution other than the User Institution. Project: The project for which the User Institution has requested access to these Data. A description of the Project is set out in Appendix II. Publications: Includes, without limitation, articles published in print journals, electronic journals, reviews, books, posters and other written and verbal presentations of research. Research Participant: An individual whose data form part of these Data. Research Purposes: Shall mean research that is seeking to advance the understanding of genetics and genomics, including the treatment of disorders, and work on statistical methods that may be applied to such research. User: The principal investigator for the Project. User Institution(s): The Institution that has requested access to the Data. 10x Genomics: 10X Genomics, Inc. (10x Genomics) is a U.S. company incorporated in the State of Delaware, building tools for scientific discovery that reveal and address the complexities of biology and disease. Through a combination of novel microfluidics, chemistry and bioinformatics, 10x Genomics’ Chromium System enables researchers to more fully understand the fundamentals of biology at high resolution and scale. 1. The User Institution agrees to only use these Data for the purpose of the Project (described in Appendix II) and only for Research Purposes. The User Institution further agrees that it will only use these Data for Research Purposes which are within the limitations (if any) set out in Appendix I. 2. The User Institution agrees to preserve, at all times, the confidentiality of these Data. In particular, it undertakes not to use, or attempt to use these Data to compromise or otherwise infringe the confidentiality of information on Research Participants. Without prejudice to the generality of the foregoing, the User Institution agrees to use at least the measures set out in Appendix I to protect these Data. 3. The User Institution agrees to protect the confidentiality of Research Participants in any research papers or publications that they prepare by taking all reasonable care to limit the possibility of identification. 4. The User Institution agrees not to link or combine these Data to other information or archived data available in a way that could re-identify the Research Participants, even if access to that data has been formally granted to the User Institution or is freely available without restriction. 5. The User Institution agrees only to transfer or disclose these Data, in whole or part, or any material derived from these Data, to the Authorized Personnel. Should the User Institution wish to share these Data with an External Collaborator, the External Collaborator must complete a separate application for access to these Data. 6. The User Institution agrees that the Data Producers, and all other parties involved in the creation, funding or protection of these Data: a) make no warranty or representation, express or implied as to the accuracy, quality or comprehensiveness of these Data; b) exclude to the fullest extent permitted by law all liability for actions, claims, proceedings, demands, losses (including but not limited to loss of profit), costs, awards damages and payments made by the Recipient that may arise (whether directly or indirectly) in any way whatsoever from the Recipient’s use of these Data or from the unavailability of, or break in access to, these Data for whatever reason and; c) bear no responsibility for the further analysis or interpretation of these Data. 7. The User Institution agrees to follow the Fort Lauderdale Guidelines (http://www.wellcome.ac.uk/stellent/groups/corporatesite/@policy_communications/documents/web_document/wtd003207.pdf ) and the Toronto Statement (http://www.nature.com/nature/journal/v461/n7261/full/461168a.html). This includes but is not limited to recognizing the contribution of the Data Producers and including a proper acknowledgement in all reports or publications resulting from the use of these Data. 8. The User Institution agrees not to make intellectual property claims on these Data and not to use intellectual property protection in ways that would prevent or block access to, or use of, any element of these Data, or conclusion drawn directly from these Data. 9. The User Institution can elect to perform further research that would add intellectual and resource capital to these data and decide to obtain intellectual property rights on these downstream discoveries. In this case, the User Institution agrees to implement licensing policies that will not obstruct further research and to follow the U.S. National Institutes of Health Best Practices for the Licensing of Genomic Inventions (2005) (https://www.icgc.org/files/daco/NIH_BestPracticesLicensingGenomicInventions_2005_en.pdf ) in conformity with the Organization for Economic Co-operation and Development Guidelines for the Licensing of the Genetic Inventions (2006) (http://www.oecd.org/science/biotech/36198812.pdf ). 10. The User Institution agrees to destroy/discard the Data held, once it is no longer used for the Project, unless obliged to retain the data for archival purposes in conformity with audit or legal requirements. 11. The User Institution will notify 10x Genomics within 30 days of any changes or departures of Authorised Personnel. 12. The User Institution will notify 10x Genomics prior to any significant changes to the protocol for the Project. 13. The User Institution will notify 10x Genomics as soon as it becomes aware of a breach of the terms or conditions of this agreement. 14. 10x Genomics may terminate this agreement by written notice to the User Institution. If this agreement terminates for any reason, the User Institution will be required to destroy any Data held, including copies and backup copies. This clause does not prevent the User Institution from retaining these data for archival purpose in conformity with audit or legal requirements. 15. The User Institution accepts that it may be necessary for the Data Producers to alter the terms of this agreement from time to time. As an example, this may include specific provisions relating to the Data required by Data Producers other than 10x Genomics. In the event that changes are required, the Data Producers or their appointed agent will contact the User Institution to inform it of the changes and the User Institution may elect to accept the changes or terminate the agreement. 16. If requested, the User Institution will allow data security and management documentation to be inspected to verify that it is complying with the terms of this agreement. 17. The User Institution agrees to distribute a copy of these terms to the Authorized Personnel. The User Institution will procure that the Authorized Personnel comply with the terms of this agreement. 18. This agreement (and any dispute, controversy, proceedings or claim of whatever nature arising out of this agreement or its formation) shall be construed, interpreted and governed by the laws of England and Wales and shall be subject to the exclusive jurisdiction of the English courts. Agreed for User Institution Signature: Name: Title: Date:   Principal Investigator I confirm that I have read and understood this Agreement. Signature: Name: Title: Date:   Agreed for 10x Genomics Signature: Name: Title: Date:   APPENDIX I – DATASET DETAILS APPENDIX II ––PROJECT DETAILS APPENDIX III –– PUBLICATION POLICY APPENDIX I – DATASET DETAILS (to be completed by the data producer before passing to applicant) Dataset reference (EGA Study ID and Dataset Details) EGA study ID: EGAS00001003121 Dataset details: Detailed analysis in the Materials and methods section of: https://doi.org/10.1101/230946 Name of project that created the dataset Resolving the Full Spectrum of Human Genome Variation using Linked-Reads Names of other data producers/collaborators Patrick Marksa, Sarah Garciaa, Alvaro Martinez Barrioa, Kamila Belhocinea, Jorge Bernatea, Rajiv Bharadwaja, Keith Bjornsona, Claudia Catalanottia, Josh Delaneya, Adrian Fehra, Ian Fiddesa, Brendan Galvina, Haynes Heatona,e,f, Jill Herschleba, Christopher Hindsona, Esty Holtb, Cassandra B. Jabaraa,g, Susanna Jetta, Nikka Keivanfara, Sofia Kyriazopoulou-Panagiotopouloua,h, Monkol Lekc,d, Bill Lina, Adam Lowea, Shazia Mahamdallieb, Shamoni Maheshwaria, Tony Makarewicza, Jamie Marshalld, Francesca Meschia, Chris O’keefea, Heather Ordoneza, Pranav Patela, Andrew Pricea, Ariel Royalla, Elise Ruarkb, Sheila Sealb, Michael Schnall-Levina, Preyas Shaha, David Stafforda, Stephen Williamsa, Indira Wua, Andrew Wei Xua, Nazneen Rahmanb, Daniel MacArthurc,d, Deanna M. Churcha a: 10x Genomics, 7068 Koll Center Parkway, Suite 401, Pleasanton, CA 94566; b: The Institute of Cancer Research, Division of Genetics & Epidemiology, 15 Cotswold Road, London, SM2 5NG, UK; c: Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA; d: Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA; e: Current affiliation, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK; f: Current affiliation, University of Cambridge, Cambridge, UK; g: Current affiliation, Purigen Biosystems, Inc., 5700 Stoneridge Drive, Suite 100, Pleasanton, CA 94588; h: Current affiliation, Illumina, Inc., 499 Illinois Street, Suite 201, San Francisco, CA 94158 Specific limitations on areas of research (none) Minimum protection measures required File access: Data can be held in unencrypted files on an institutional compute system, with Unix user group read/write access for one or more appropriate groups but not Unix world read/write access behind a secure firewall. Laptops holding these data should have password protected logins and screenlocks (set to lock after 5 min of inactivity). If held on USB keys or other portable hard drives, the data must be encrypted. APPENDIX II – PROJECT DETAILS (to be completed by the Requestor) Details of dataset requested i.e., EGA Study and Dataset Accession Number Brief abstract of the Project in which the Data will be used (500 words max) All Individuals who the User Institution to be named as registered users Name of Registered User Email Job Title Supervisor* All Individuals that should have an account created at the EGA Name of Registered User Email Job Title APPENDIX III – PUBLICATION POLICY XXXXX intend to publish the results of their analysis of this dataset and do not consider its deposition into public databases to be the equivalent of such publications. XXXXX anticipate that the dataset could be useful to other qualified researchers for a variety of purposes. However, some areas of work are subject to a publication moratorium. In any publications based on these data, please describe how the data can be accessed, including the name of the hosting database (e.g., The European Genome-phenome Archive at the European Bioinformatics Institute) and its accession numbers (e.g., EGAS00000000029), and acknowledge its use in a form agreed by the User Institution with XXXXX. All Individuals that should have an account created at the EGA Name of Registered User Email Job Title

Studies are experimental investigations of a particular phenomenon, e.g., case-control studies on a particular trait or cancer research projects reporting matching cancer normal genomes from patients.

Study ID Study Title Study Type
Other

This table displays only public information pertaining to the files in the dataset. If you wish to access this dataset, please submit a request. If you already have access to these data files, please consult the download documentation.

ID File Type Size Located in
EGAF00002056467 bam 7.5 GB
EGAF00002056468 bam 10.7 GB
EGAF00002056469 bam 10.2 GB
EGAF00002056470 bam 8.9 GB
EGAF00002056471 bam 9.1 GB
EGAF00002056472 bam 8.7 GB
EGAF00002056473 bam 10.5 GB
EGAF00002056474 bam 8.8 GB
EGAF00002056475 bam 8.9 GB
EGAF00002056476 bam 7.6 GB
EGAF00002056477 bam 8.6 GB
EGAF00002056478 bam 8.0 GB
EGAF00002056479 bam 9.0 GB
EGAF00002056480 bam 7.3 GB
EGAF00002056481 bam 10.5 GB
EGAF00002056482 bam 8.9 GB
EGAF00002056483 bam 8.8 GB
EGAF00002056484 bam 11.0 GB
EGAF00002056485 bam 8.7 GB
EGAF00002056486 bam 10.3 GB
EGAF00002056487 bam 8.6 GB
EGAF00002056488 bam 8.8 GB
EGAF00002056489 bam 8.5 GB
EGAF00002056490 bam 8.7 GB
EGAF00002056491 bam 8.7 GB
EGAF00002056492 bam 7.4 GB
EGAF00002056493 bam 9.1 GB
EGAF00002056494 bam 8.4 GB
EGAF00002056495 bam 9.0 GB
EGAF00002056496 bam 8.3 GB
EGAF00002056525 vcf.gz 7.1 kB
EGAF00002056526 tbi 4.7 kB
EGAF00002056527 vcf.gz 12.9 kB
EGAF00002056528 tbi 6.7 kB
EGAF00002056529 vcf.gz 66.4 MB
EGAF00002056530 tbi 1.2 MB
EGAF00002056531 vcf.gz 12.8 kB
EGAF00002056532 tbi 6.7 kB
EGAF00002056533 vcf.gz 94.3 MB
EGAF00002056534 tbi 1.3 MB
EGAF00002056535 vcf.gz 6.8 kB
EGAF00002056536 tbi 4.6 kB
EGAF00002056537 vcf.gz 80.4 MB
EGAF00002056538 tbi 1.4 MB
EGAF00002056539 vcf.gz 13.2 kB
EGAF00002056540 tbi 6.8 kB
EGAF00002056541 vcf.gz 97.3 kB
EGAF00002056542 tbi 30.0 kB
EGAF00002056543 vcf.gz 6.7 kB
EGAF00002056544 tbi 4.5 kB
EGAF00002056545 vcf.gz 86.1 MB
EGAF00002056546 tbi 1.4 MB
EGAF00002056547 vcf.gz 7.4 kB
EGAF00002056548 tbi 4.6 kB
EGAF00002056549 vcf.gz 10.6 kB
EGAF00002056550 tbi 5.7 kB
EGAF00002056551 vcf.gz 73.4 MB
EGAF00002056552 tbi 1.2 MB
EGAF00002056553 vcf.gz 6.7 kB
EGAF00002056554 tbi 4.5 kB
EGAF00002056555 vcf.gz 28.7 kB
EGAF00002056556 tbi 9.8 kB
EGAF00002056557 vcf.gz 76.1 kB
EGAF00002056558 tbi 24.4 kB
EGAF00002056559 vcf.gz 73.4 MB
EGAF00002056560 tbi 1.2 MB
EGAF00002056561 vcf.gz 97.6 MB
EGAF00002056562 tbi 1.3 MB
EGAF00002056563 vcf.gz 72.3 MB
EGAF00002056564 tbi 1.2 MB
EGAF00002056565 vcf.gz 6.8 kB
EGAF00002056566 tbi 4.5 kB
EGAF00002056567 vcf.gz 93.0 MB
EGAF00002056568 tbi 1.2 MB
EGAF00002056569 vcf.gz 70.1 MB
EGAF00002056570 tbi 1.1 MB
EGAF00002056571 vcf.gz 76.2 MB
EGAF00002056572 tbi 1.3 MB
EGAF00002056573 vcf.gz 93.4 MB
EGAF00002056574 tbi 1.3 MB
EGAF00002056575 vcf.gz 17.7 kB
EGAF00002056576 tbi 7.6 kB
EGAF00002056577 vcf.gz 68.7 MB
EGAF00002056578 tbi 1.3 MB
EGAF00002056579 vcf.gz 6.0 kB
EGAF00002056580 tbi 4.1 kB
EGAF00002056581 vcf.gz 29.4 kB
EGAF00002056582 tbi 11.1 kB
EGAF00002056583 vcf.gz 74.1 MB
EGAF00002056584 tbi 1.2 MB
EGAF00002056585 vcf.gz 65.2 MB
EGAF00002056586 tbi 1.3 MB
EGAF00002056587 vcf.gz 76.2 MB
EGAF00002056588 tbi 1.3 MB
EGAF00002056589 vcf.gz 315.4 kB
EGAF00002056590 tbi 79.7 kB
EGAF00002056591 vcf.gz 5.4 kB
EGAF00002056592 tbi 3.6 kB
EGAF00002056593 vcf.gz 23.8 kB
EGAF00002056594 tbi 8.6 kB
EGAF00002056595 vcf.gz 6.0 kB
EGAF00002056596 tbi 3.7 kB
EGAF00002056597 vcf.gz 6.8 kB
EGAF00002056598 tbi 3.9 kB
EGAF00002056599 vcf.gz 66.5 MB
EGAF00002056600 tbi 1.3 MB
EGAF00002056601 vcf.gz 13.9 kB
EGAF00002056602 tbi 6.5 kB
EGAF00002056603 vcf.gz 6.9 kB
EGAF00002056604 tbi 4.5 kB
EGAF00002056605 vcf.gz 65.6 MB
EGAF00002056606 tbi 1.2 MB
EGAF00002056607 vcf.gz 8.3 kB
EGAF00002056608 tbi 4.8 kB
EGAF00002056609 vcf.gz 6.1 kB
EGAF00002056610 tbi 4.0 kB
EGAF00002056611 vcf.gz 66.9 MB
EGAF00002056612 tbi 1.2 MB
EGAF00002056613 vcf.gz 84.1 MB
EGAF00002056614 tbi 1.5 MB
EGAF00002056615 vcf.gz 81.0 MB
EGAF00002056616 tbi 1.4 MB
EGAF00002056617 vcf.gz 15.9 kB
EGAF00002056618 tbi 7.5 kB
EGAF00002056619 vcf.gz 78.1 MB
EGAF00002056620 tbi 1.3 MB
EGAF00002056621 vcf.gz 88.7 MB
EGAF00002056622 tbi 1.5 MB
EGAF00002056623 vcf.gz 77.8 MB
EGAF00002056624 tbi 1.4 MB
EGAF00002056625 vcf.gz 73.2 MB
EGAF00002056626 tbi 1.3 MB
EGAF00002056627 vcf.gz 61.7 MB
EGAF00002056628 tbi 1.1 MB
EGAF00002056629 vcf.gz 71.7 MB
EGAF00002056630 tbi 1.3 MB
EGAF00002056631 vcf.gz 9.8 kB
EGAF00002056632 tbi 5.4 kB
EGAF00002056633 vcf.gz 77.2 MB
EGAF00002056634 tbi 1.3 MB
EGAF00002056635 vcf.gz 82.4 MB
EGAF00002056636 tbi 1.4 MB
EGAF00002056637 vcf.gz 81.3 MB
EGAF00002056638 tbi 1.4 MB
EGAF00002056639 vcf.gz 78.8 MB
EGAF00002056640 tbi 1.3 MB
EGAF00002056641 vcf.gz 5.6 kB
EGAF00002056642 tbi 3.4 kB
EGAF00002056643 vcf.gz 75.8 MB
EGAF00002056644 tbi 1.4 MB
EGAF00002056645 vcf.gz 82.2 MB
EGAF00002056646 tbi 1.4 MB
EGAF00002056647 vcf.gz 17.6 kB
EGAF00002056648 tbi 7.7 kB
EGAF00002056649 vcf.gz 93.3 MB
EGAF00002056650 tbi 1.3 MB
EGAF00002056651 vcf.gz 17.2 kB
EGAF00002056652 tbi 7.7 kB
EGAF00002056653 vcf.gz 87.9 MB
EGAF00002056654 tbi 1.4 MB
EGAF00002056655 vcf.gz 97.9 MB
EGAF00002056656 tbi 1.4 MB
162 Files (270.4 GB)