13 Choices and reason behind those choices for : VARWRRUMM track
13.1 HaploSSembly for mapping
Haploid assembly (or as near as possible is required for this variant calling track). We use FREEBAYES variant mapper for variant calling. We recommend to read FREEBAYES instructions carefully to understand what has been done (an not) here
13.2 Data preparation prior to variant calling
Reads are mapped using BWA-MEM . All alternative alignments are reported as secondary alignments in the BAM file and shorter splits are marked as secondary. See also: BWA-MEM reference manual. Reads groups (ID and SM corresponding to sample id are added).
The mapped file is then indexed using samtools.
This is followed by marking of duplicate reads (default), as recommended in Freebayes, using sambamba.
Note that it is also possible to ignore those (eg. you did not use PCR for library preparation), or remove them totally. <?thomas> talk !
For information purpose, we then compute the coverage depth using samtools.
13.3 Variant Calling Freebayes
13.3.1 Raw variant calling
We used Freebayes for variant calling. Freebayes has been developped for variant calling on diploid genomes (but can be used for other ploidy levels) based on illumina short reads. It allows probabilistic calling of variants: computes the probability that a variant exist at the loci
13.3.2 Filtering calls: quality insurance
QUAL and or depth (DP) or observation count
- QUAL: probability that there is a polymorphism at the loci described by the record. \(1 - P_{locus\ is\ homozygous\ given\ the\ data}\) [GQ, when supplying –genotype-qualities]
vcffilter in vcflib
probability of not being polymorphic less than phred 20 (aka 0.01), or probability of polymorphism > 0.99.
examine output manually
Usefull links:
13.4 Normalization of variants representation
Freebayes output VCF 4.2:
> “probabilistic description of allelic variants within a population of samples, but it is equally suited to describing the probability of variation in a single sample.”
citat from
- phred and probability of not being polymorphic (or formula)