Hartwig Medical Foundation Data Access Request Guide

This page provides practical information on how to access / work with the data that will be made available to you within the context of a Data Access Request (DR) from Hartwig Medical Foundation.

Note: more details on the methods used to generate both the genomic and clinical data can be found on a separate Methods page.

Contents

General Notes

Please use the unique ID given to your request (eg. "DR-XXX") in any communication with us about your data request.

Sample selection

By default, in addition to data-request specific criteria, samples for which one of the below applies are excluded:

Primary tumor location and type via DOID ontology

The primary tumor location and/or type of the samples in the database are mapped to the DOID ontology (as detailed as possible on available data, for information of the DOID ontology system see: https://www.ebi.ac.uk/ols/ontologies/doid). Please find the tree of the doids in the Hartwig Medical Foundation database here.

Format of the data made available

Clinical Data (TSV format)

Clinical data will be made available in a metadata.tar via GCP.

Some notes about the clinical data:

Please find more details on the methods used to generate both the genomic and clinical data on a separate Methods page.

Somatic Data (VCF/TXT formats)

Somatic data will be made available in a somatics.tar via GCP.

Per sample the following files are present:

In the purple folder --

For an explanation of the contents of the purple. files, see PURPLE.

In the linx folder --

For an explanation of the contents of the linx. files, see LINX.

Germline Data (VCF/TXT formats)

Germline data will be made available in a germline.tar file via GCP.

We share the SNVs and small INDELs called from the reference sample using GATK haplotype caller.

Aligned readout data (CRAM format)

Aligned readout data will be made available per sample via GCP.

Some notes to keep in mind:

Example loading CRAM file in IGV:

It is possible to directly load CRAM files into IGV using the Google Cloud Storage URL. Please note that to do this, IGV requires your permission to access both Google Cloud Storage and Google Drive. It is at this time not possible to exclude Google Drive from these permissions. To load a CRAM file directly from Google Cloud Storage:

Please find more details on the methods used to generate both the genomic and clinical data on a separate Methods page.

RNA-seq data (FASTQ format)

RNA-seq data will be made available per sample via GCP.

Some notes to keep in mind:

Example data (COLO829v0003T)

COLO829v003T is a melanoma cell line that can be used for testing. The COLO829v003T somatic tar file (different pipeline versions) can be downloaded from our resources page. We also have the COLO829v003T available on the Google Cloud Platform (somatic and germline tar file, and the cram files). To be added to the ACL of these files please send an email to ict@hartwigmedicalfoundation.nl including the GCP account the data should be made available for (please note this should be a GCP account set up with an institutional email address, see Getting Started with Google Cloud Platform).

More information