GenomicsDB is a highly performant scalable data storage written in C++ for importing, querying and transforming genomic variant data.
Supported platforms and filesystems:¶
- Linux and MacOS.
- POSIX, HDFS, EMRFS(S3), GCS and Azure Blob.
- JVM/Spark wrappers that allow for streaming VariantContext buffers to/from the C++ layer among other functions. GenomicsDB jars with native libraries and only zlib dependencies are regularly published on Maven Central.
- Native tools for incremental ingestion of variants in the form of VCF/BCF/CSV into GenomicsDB for performance.
- MPI and Spark support for parallel querying of GenomicsDB.
GenomicsDB is packaged into gatk4 and benefits qualitatively from a large user base.
The GenomicsDB documentation for users is hosted as a Github wiki: https://github.com/GenomicsDB/GenomicsDB/wiki
GenomicsDB is open source and all participation is welcome. Please read the guidelines to help with contributions.