Skip to the content.

Frequently Asked Questions

This page contains the frequently asked questions regarding Hartwig Medical Foundation Data Access Requests. Please note that this page is not exhaustive and that you should consult your License Agreement regarding the exact agreements that were made between Hartwig Medical Foundation and your institute.

General Questions

You can find all details regarding the Data Access Procedure at our website.

Collaboration is possible on the following grounds:

An overview of the methods for data collection is available, as well as a guide that describes the data that we make available.

All our tools and code is publicly available on our GitHub.

Google Cloud Platform Questions

We are not able to set the GCP account up for you, but we do have a preferred Google Partner that can help you out. To get the onboarding proces started with our preferred Google Partner, please visit this webpage. General instructions on getting a GCP account can be found in our Getting a Google account instructions. If the instructions are somehow not working for you, please contact us.

As you may know, the data included in the database of Hartwig Medical Foudation is very sensitive information and concerns health and genetic data of patients. In view hereof, Hartwig Medical Foundation must ensure that the data that it makes available for research is only used for the purposes for which the data is made available and by the persons authorized to access it. Further, it wishes to safeguard as much as possible that the data are duly processed and protected, once made available for research. It is in view hereof that Hartwig Medical Foundation makes available the data to legal entities (such as the institution you are working for), why the data access request is submitted on behalf of your institution and why the license agreement covering the use of data is entered into by your institution. It is your institution that is responsible for compliance with the terms of the license agreement. By requiring you to use your institution email address, we can ensure that you are indeed working for the institution that was granted access to the data and that you will no longer have access to the data once you no longer work for the institution (and are no longer authorized to access the data). Further, your institution’s email account is often better secured than a private email account. So in brief, we require you to use your institution email address to access the data from Hartwig Medical Foundation in order to protect the data.

Advantages for you as researcher:

We sometimes receive feedback from external parties asking if we’re not afraid of data misuse by handing it over to Google. We don’t hand over any data to Google! We have a contract with Google to use their Google Cloud Platform (GCP) service. The data that we upload to GCP is encrypted using our own key-pair and as a result is not accessible by Google.

We have received feedback from external parties whether it is ethical to ‘make Google rich’ by using their platform to make data available to the public. GCP is in fact one of the cheapest cloud vendors available on the market. There are also private cloud providers on the market, but they are more expensive than our current solution. Either way, any ‘cloud provider’ will render revenue with deploying their compute and storage services. We have made an extensive analysis of all relevant providers. In our opinion we chose a good, solid and relatively cheap solution by moving our business to GCP.

Hartwig Medical Foundation decided to move to Google Cloud Platform as it's infrastructure provider for multiple reasons:

Of course, we also considered the downside of moving to GCP (or any other public/non-public cloud provider):

We also considered hybrid alternatives, for example by moving some of our data to EGA. At this point in time however, EGA would mean a duplication of our database, as we also still need the data readily available to keep the data that we provide for data access requests on our latest-greatest version of tools.

No, we can only allow personal institutional email addresses as GCP accounts to release the data to. When releasing data to a service account, an admin account may be added to the project, but can't have the rights to access the service account that the data is linked to. Please see the question right above this one for the specific explanation why we can't allow this.

A service account allows you to process the data that you can access through the ACL in a more efficient way, as you're able to automatise. With a service account, you can spin up multiple VMs simultaniously in one go, whereas with your personal GCP account, you'd have to do this for each sample seperately.

We need to audit your project IAM settings when you want to use a service account for data access, as we want to make sure only the registered 'Download contacts' have access to the data (through restricted access to the service account), as agreed upon in the License Agreement. The audit is performed like this:

Please follow the instructions in our Getting a Google account documentation. In short, go to the Google Account creation page, click 'Use my current email address instead' and follow the prompts. Please note that we need multi-factor authentication enabled for all accounts we share our data with.

Billing Questions

Hartwig won't bill you for accessing our data, but accessing the data through Google Cloud Platform does come with costs. Basically, there are two ways to approach the data:

What option to choose depends on your use case and specific requirements.

A resource that might help you in estimating costs is the Google Pricing calculator. Another factor that you need to include when estimating cost, is the cost for egress.

Utilising the Google Cloud Platform could also work in your benefit, compare the following scenarios that could occur when your data access request is approved for 100 samples:

There is no hard cap to cut off costs at a certain point, but there are several options to monitor your budget within GCP: