GenomicsDB Protobuf Documentation

Top

genomicsdb_callsets_mapping.proto

CallsetMappingPB

Field

Type

Label

Description

callsets

SampleIDToTileDBIDMap

repeated

SampleIDToTileDBIDMap

Field

Type

Label

Description

sample_name

string

required

row_idx

int64

required

idx_in_file

int64

required

stream_name

string

optional

filename

string

optional

Top

genomicsdb_coordinates.proto

ContigInterval

Field

Type

Label

Description

contig

string

required

begin

int64

optional

end

int64

optional

ContigPosition

Field

Type

Label

Description

contig

string

required

position

int64

required

GenomicsDBColumn

Field

Type

Label

Description

tiledb_column

int64

optional

contig_position

ContigPosition

optional

GenomicsDBColumnInterval

Field

Type

Label

Description

tiledb_column_interval

TileDBColumnInterval

optional

contig_interval

ContigInterval

optional

GenomicsDBColumnOrInterval

Field

Type

Label

Description

column

GenomicsDBColumn

optional

column_interval

GenomicsDBColumnInterval

optional

TileDBColumnInterval

Field

Type

Label

Description

begin

int64

required

end

int64

required

Top

genomicsdb_export_config.proto

AnnotationSource

Field

Type

Label

Description

filename

string

required

data_source

string

required

attributes

string

repeated

is_vcf

bool

optional

Default: true

file_chromosomes

string

repeated

ExportConfiguration

Field

Type

Label

Description

workspace

string

required

reference_genome

string

optional

array_name

string

optional

generate_array_name_from_partition_bounds

bool

optional

Default: true

query_column_ranges

GenomicsDBColumnOrIntervalList

repeated

Only one of the following two fields must be defined query_contig_intervals is recommended for use

query_contig_intervals

ContigInterval

repeated

query_row_ranges

RowRangeList

repeated

Only one of the following two fields must be defined

query_sample_names

string

repeated

attributes

string

repeated

query_filter

string

optional

QueryConfiguration - END

vcf_header_filename

string

optional

vcf_output_filename

string

optional

vcf_output_format

string

optional

vid_mapping_file

string

optional

vid_mapping

VidMappingPB

optional

callset_mapping_file

string

optional

callset_mapping

CallsetMappingPB

optional

max_diploid_alt_alleles_that_can_be_genotyped

uint32

optional

Other configuration

max_genotype_count

uint32

optional

index_output_VCF

bool

optional

produce_GT_field

bool

optional

produce_FILTER_field

bool

optional

sites_only_query

bool

optional

produce_GT_with_min_PL_value_for_spanning_deletions

bool

optional

scan_full

bool

optional

segment_size

uint32

optional

Default: 10485760

combined_vcf_records_buffer_size_limit

uint32

optional

enable_shared_posixfs_optimizations

bool

optional

Default: false

bypass_intersecting_intervals_phase

bool

optional

Default: false

spark_config

SparkConfig

optional

annotation_source

AnnotationSource

repeated

annotation_buffer_size

uint32

optional

Default: 10240

GenomicsDBColumnOrIntervalList

Field

Type

Label

Description

column_or_interval_list

GenomicsDBColumnOrInterval

repeated

QueryConfiguration

Simple query configuration for GenomicsDB::query_variant_calls for the class initialized with ExportConfiguration below

Field

Type

Label

Description

array_name

string

optional

generate_array_name_from_partition_bounds

bool

optional

Default: true

query_column_ranges

GenomicsDBColumnOrIntervalList

repeated

Only one of the following two fields must be defined query_contig_intervals is recommended for use

query_contig_intervals

ContigInterval

repeated

query_row_ranges

RowRangeList

repeated

Only one of the following two fields must be defined

query_sample_names

string

repeated

attributes

string

repeated

query_filter

string

optional

RowRange

Field

Type

Label

Description

low

int64

required

high

int64

required

RowRangeList

Field

Type

Label

Description

range_list

RowRange

repeated

SparkConfig

Field

Type

Label

Description

query_block_size

int64

optional

query_block_size_margin

int64

optional

Top

genomicsdb_import_config.proto

ImportConfiguration

Field

Type

Label

Description

size_per_column_partition

int64

required

Default: 16384

row_based_partitioning

bool

optional

Default: false

produce_combined_vcf

bool

optional

Default: false

produce_tiledb_array

bool

optional

Default: true

column_partitions

Partition

repeated

vid_mapping_file

string

optional

vid_mapping

VidMappingPB

optional

callset_mapping_file

string

optional

callset_mapping

CallsetMappingPB

optional

treat_deletions_as_intervals

bool

optional

Default: true

num_parallel_vcf_files

int32

optional

Default: 1

delete_and_create_tiledb_array

bool

optional

Default: false

do_ping_pong_buffering

bool

optional

Default: true

offload_vcf_output_processing

bool

optional

Default: true

discard_vcf_index

bool

optional

Default: true

segment_size

int64

optional

Default: 10485760

compress_tiledb_array

bool

optional

Default: true

num_cells_per_tile

int64

optional

Default: 1000

fail_if_updating

bool

optional

Default: false

tiledb_compression_type

int32

optional

Default: 1

tiledb_compression_level

int32

optional

Default: -1

consolidate_tiledb_array_after_load

bool

optional

Default: false

disable_synced_writes

bool

optional

Default: true

ignore_cells_not_in_partition

bool

optional

lb_callset_row_idx

int64

optional

Default: 0

ub_callset_row_idx

int64

optional

enable_shared_posixfs_optimizations

bool

optional

Default: false

disable_delta_encode_for_offsets

bool

optional

Default: false

disable_delta_encode_for_coords

bool

optional

Default: false

enable_bit_shuffle_gt

bool

optional

Default: false

enable_lz4_compression_gt

bool

optional

Default: false

reference_genome

string

optional

vcf_header_filename

string

optional

Partition

Field

Type

Label

Description

begin

GenomicsDBColumn

required

workspace

string

optional

array_name

string

optional

generate_array_name_from_partition_bounds

bool

optional

vcf_output_filename

string

optional

vcf_header_filename

string

optional

end

GenomicsDBColumn

optional

Top

genomicsdb_vid_mapping.proto

Chromosome

Field

Type

Label

Description

name

string

required

length

int64

required

tiledb_column_offset

int64

required

FieldLengthDescriptorComponentPB

Field

Type

Label

Description

variable_length_descriptor

string

optional

fixed_length

int32

optional

GenomicsDBFieldInfo

Field

Type

Label

Description

name

string

required

type

string

repeated

vcf_field_class

string

repeated

vcf_type

string

optional

length

FieldLengthDescriptorComponentPB

repeated

vcf_delimiter

string

repeated

VCF_field_combine_operation

string

optional

vcf_name

string

optional

useful when multiple fields of different types/length with the same name (FILTER, FORMAT, INFO) are defined in the VCF header

disable_remap_missing_with_non_ref

bool

optional

Default: false

VidMappingPB

Field

Type

Label

Description

fields

GenomicsDBFieldInfo

repeated

contigs

Chromosome

repeated

Scalar Value Types

.proto Type

Notes

C++

Java

Python

Go

double

double

double

float

float64

float

float

float

float

float32

int32

Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.

int32

int

int

int32

int64

Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.

int64

long

int/long

int64

uint32

Uses variable-length encoding.

uint32

int

int/long

uint32

uint64

Uses variable-length encoding.

uint64

long

int/long

uint64

sint32

Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.

int32

int

int

int32

sint64

Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.

int64

long

int/long

int64

fixed32

Always four bytes. More efficient than uint32 if values are often greater than 2^28.

uint32

int

int

uint32

fixed64

Always eight bytes. More efficient than uint64 if values are often greater than 2^56.

uint64

long

int/long

uint64

sfixed32

Always four bytes.

int32

int

int

int32

sfixed64

Always eight bytes.

int64

long

int/long

int64

bool

bool

boolean

boolean

bool

string

A string must always contain UTF-8 encoded or 7-bit ASCII text.

string

String

str/unicode

string

bytes

May contain any arbitrary sequence of bytes.

string

ByteString

str

[]byte