C++ Query Interface

class GenomicsDB

Experimental Query Interface to GenomicsDB for Arrays partitioned by columns Concurrency support is provided via query json files for now - see https://github.com/GenomicsDB/GenomicsDB/wiki/Querying-GenomicsDB#json-configuration-file-for-a-query https://github.com/GenomicsDB/GenomicsDB/wiki/MPI-with-GenomicsDB

Public Functions

GENOMICSDB_EXPORT GenomicsDB(const std::string &workspace, const std::string &callset_mapping_file, const std::string &vid_mapping_file, const std::vector<std::string> attributes = ALL_ATTRIBUTES, const uint64_t segment_size = DEFAULT_SEGMENT_SIZE)

Constructor to the GenomicsDB Query API workspace callset_mapping_file vid_mapping_file attributes, optional segment_size, optional Throws GenomicsDBException

GENOMICSDB_EXPORT GenomicsDB(const std::string &query_configuration, const query_config_type_t query_configuration_type = JSON_FILE, const std::string &loader_configuration_json_file = std::string(), const int concurrency_rank = 0)

Constructor to the GenomicsDB Query API with configuration json files query_configuration - describe the query configuration in either a JSON file or JSON string or protobuf binary query_configuration_type - type of query configuration, could be a JSON_FILE or JSON_STRING or PROTOBUF_BINARY_STRING loader_config_json_file, optional - describe the loader configuration in a JSON file. If a configuration key exists in both the query and the loader configuration, the query configuration takes precedence concurrency_rank, optional - if greater than 0, the constraints(workspace, array, column and row ranges) are surmised using the rank as an index into their corresponding vectors Throws GenomicsDBException

GENOMICSDB_EXPORT ~GenomicsDB()

Destructor

GENOMICSDB_EXPORT GenomicsDBVariants query_variants (const std::string &array, genomicsdb_ranges_t column_ranges=SCAN_FULL, genomicsdb_ranges_t row_ranges={})

Query GenomicsDB array for variants constrained by column and row ranges. Variants are similar to GAVariant in GA4GH API array column_ranges, optional row_ranges, optional

GENOMICSDB_EXPORT GenomicsDBVariants query_variants ()

Query using set configuration for variants. Useful when using parallelism paradigms(MPI, Intel TBB) Variants are similar to GAVariant in GA4GH API

GENOMICSDB_EXPORT GenomicsDBVariantCalls query_variant_calls (const std::string &array, genomicsdb_ranges_t column_ranges=SCAN_FULL, genomicsdb_ranges_t row_ranges={})

Query the array for variant calls constrained by the column and row ranges. Variant Calls are similar to GACall in GA4GH API. array column_ranges, optional row_ranges, optional

GENOMICSDB_EXPORT GenomicsDBVariantCalls query_variant_calls (GenomicsDBVariantCallProcessor &processor, const std::string &array, genomicsdb_ranges_t column_ranges=SCAN_FULL, genomicsdb_ranges_t row_ranges={})

Query the array for variant calls constrained by the column and row ranges. Variant Calls are similar to GACall in GA4GH API. array column_ranges, optional row_ranges, optional

GENOMICSDB_EXPORT GenomicsDBVariantCalls query_variant_calls ()

Query using set configuration for variant calls. Useful when using parallelism paradigms(MPI, Intel TBB) Variant Calls are similar to GACall in GA4GH API.

GENOMICSDB_EXPORT GenomicsDBVariantCalls query_variant_calls (GenomicsDBVariantCallProcessor &processor, const std::string &query_configuration, const query_config_type_t query_configuration_type)

Query with a configuration describing the subset for variant calls. Useful with paradigms like MPI, Intel TBB and when a GenomicsDB instance is cached with multiple, concurrent query_variant_calls with different subset configurations. Variant Calls are similar to GACall in GA4GH API. processor custom processor to process variant calls query_configuration protobuf export configuration as binary string, optional. If not specified, the configuration specified during class construction will be used with the query query_configuration_type Type of configuration, Currently only PROTOBUF_BINARY_STRING is supported and an exception is throwm for other types.

GENOMICSDB_EXPORT void generate_vcf (const std::string &array, genomicsdb_ranges_t column_ranges, genomicsdb_ranges_t row_ranges, const std::string &reference_genome, const std::string &vcf_header="vcf_header.vcf", const std::string &output="", const std::string &output_format="", bool overwrite=false)

Generate multi-sample vcf files from GenomicsDB in the Broad GVCF format for given array constrained by column/row ranges

GENOMICSDB_EXPORT void generate_vcf (const std::string &output="", const std::string &output_format="", bool overwrite=false)

Generate multi-sample vcf files from GenomicsDB in the Broad GVCF format using set configuration. This method is useful with parallelism paradigms (MPI, Intel TBB)

GENOMICSDB_EXPORT void generate_plink (const std::string &array, genomicsdb_ranges_t column_ranges, genomicsdb_ranges_t row_ranges, unsigned char format=7, int compression=1, bool one_pass=false, bool verbose=false, double progress_interval=-1, const std::string &output_prefix="output", const std::string &fam_list="")

Generate plink files from GenomicsDB for given array constrained by column/row ranges and given format to generate plink .ped and .map files. The output files are named <output_prefix>.ped and <output_prefix>.map respectively.

GENOMICSDB_EXPORT void generate_plink (unsigned char format=7, int compression=1, bool one_pass=false, bool verbose=false, double progress_interval=-1, const std::string &output_prefix="output", const std::string &fam_list="")

Generate plink files from GenomicsDB for given format to generate plink .ped and .map files. The output files are named <output_prefix>.ped and <output_prefix>.map respectively. This method is useful with parallelism paradigms (MPI, Intel TBB).

GENOMICSDB_EXPORT interval_t get_interval (const genomicsdb_variant_t *variant)

Utility template functions to extract information from Variant and VariantCall classes