Snowflake Data Warehouse for Large-Scale and Diverse Biological Data Management and Analysis


This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Review

1

CLINIC FOR Group, Nagisa Terrace 4F, 3-1-32 Shibaura, Minato-ku, Tokyo 108-0023, Japan

2

Kao Corporation, Bunka, Sumida-ku, Tokyo 131-8501, Japan

3

RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba, Ibaraki 305-0074, Japan

*

Authors to whom correspondence should be addressed.

Genes 2025, 16(1), 34; https://doi.org/10.3390/genes16010034 (registering DOI)

Submission received: 26 November 2024
/
Revised: 25 December 2024
/
Accepted: 27 December 2024
/
Published: 28 December 2024

Abstract

With the increasing speed of genomic, transcriptomic, and metagenomic data generation driven by the advancement and widespread adoption of next-generation sequencing technologies, the management and analysis of large-scale, diverse data in the fields of life science and biotechnology have become critical challenges. In this paper, we thoroughly discuss the use of cloud data warehouses to address these challenges. Specifically, we propose a data management and analysis framework using Snowflake, a SaaS-based data platform. We further demonstrate its convenience and effectiveness through concrete examples, such as disease variant analysis and in silico drug discovery. By introducing Snowflake, researchers can efficiently manage and analyze a wide array of biological data, enabling the discovery of new biological insights through integrated analysis. Through these specific methodologies and application examples, we aim to accelerate research progress in the field of bioinformatics.



Source link

Tatsuya Koreeda www.mdpi.com