Why is there a need of a cloud infrastructure for genomics?
In Germany, no single university or research center currently has the necessary infrastructure to perform analyses with large datasets currently becoming available in the life sciences, and to store and access these data securely. Further, lack of standardization in computational analysis workflows renders data processed in different institutions essentially non-comparable. We thus propose a model of cloud computing involving pooled resource utilization through responsible sharing of IT infrastructure and pre-defined services to facilitate engaging non-experts. This Genome-Cloud will serve as a cloud for high-throughput data in the life sciences in Germany. The development of the Genome-Cloud would follow recommendations by the Leopoldina, cautioning that Germany can only remain competitive by strategically setting up a national “omics” and IT infrastructure linking universities with non-university institutions, to bundle expertise in interdisciplinary research.
What advantages does the Genome-Cloud have for researchers?
The Genome-Cloud will have several key advantages:
- The Genome-Cloud will make genomic analysis with state-of-the-art tools widely accessible (to experts as well as non-experts), providing bioinformatics processing capabilities to numerous users in Germany.
- Pre-configured pipelines and state-of-the-art computing infrastructure will become available to the German community through the Genom-Cloud, to facilitate state-of-the-art genomic analyses (Software-as-a-service, SaaS, and Infrastructure-as-a-service, IaaS, models).
- The use of standardized analysis pipelines enabled by the Genome-Cloud will additionally facilitate integrative analyses and meta-analyses, by improving the comparability of datasets generated at different institutions.
- Through standardized data access control and centralized data storage, the Genome-Cloud will further yield improved data protection, and, once widely applied will reduce the need to duplicate commonly used datasets
- Resource sharing, and the avoidance of duplication of infrastructure, will eventually lead to reduction in overall infrastructure and operational costs.
What data types will be stored, processed and analyzed?
All types of datasets used in the life sciences can be stored in principle, although we will begin including DNA sequencing, transcriptome, and methylome data (augmented, whenever possible, with clinical data). The Genome-Cloud will accept data from basic research and disease studies, and will not be limited to particular diseases.
Are the data save and how will the data be protected?
The Genome Cloud’s data security plan, and especially the protection of data in the cloud, will comply with stringent German data protection regulations and standards. Standardized data access control and centralized data storage will confer improved data protection. Furthermore, once in use the Genome-Cloud will decrease the need for dataset duplication (reducing this specific data security risk).
Will the Genome-Cloud focus only on a particular type of research, such as cancer research?
Although the Genome-Cloud we will be initially set up with cancer datasets, for which given the amount of data already currently available the present need is particularly high, we will ensure utility across all areas of the life sciences.