About VCF Reference Filter
VCF Reference Filter is a robust tool designed to help geneticists and bioinformaticians filter Variant Call Format (VCF) files. Simplify the process of identifying significant variants with precision and ease.
VCF (Variant Call Format) files are a standard format used in bioinformatics to store information about genetic variations. These files are generated after sequencing data is processed through variant calling pipelines, such as GATK, bcftools, or FreeBayes. VCF files are used to describe differences in DNA sequences compared to a reference genome.
Attributes of a VCF File
VCF files contain a header section and a data section:
- Header Section: Metadata lines starting with
##contain information about the file, including the reference genome, tools used, and annotations. - Column Headers: These provide structure to the data fields:
#CHROM: Chromosome identifier.POS: Position of the variant on the chromosome.ID: Variant identifier (e.g., rsID or., meaning no ID).REF: Reference allele at the position.ALT: Alternate allele(s) at the position.QUAL: Phred-scaled quality score for the variant.FILTER: Quality filters applied (e.g.,PASS,., or a reason for exclusion).INFO: Additional annotations about the variant (e.g., depth, allele frequency, functional impact).FORMAT: Defines the genotype fields (e.g.,GT,PL, etc.).- Sample Columns: Contains genotype and related data for each sample (e.g.,
Parent1_sorted.bamandParent2_sorted.bam).
Filtering Where Parent Genotypes Differ
To filter cases where the genotypes of parents differ (e.g., Parent1 starts with 1 and Parent2 starts with 0):
- Focus on the FORMAT field, specifically the
GT(genotype) data. - Interpret genotype values:
1/1: Indicates a homozygous alternate genotype.0/1: Heterozygous genotype with one reference and one alternate allele.
- Compare the
GTvalues betweenParent1_sorted.bamandParent2_sorted.bam.
Example:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Parent1_sorted.bam Parent2_sorted.bam
Chr0 4027 . CG CGG 178 . INDEL;... GT:PL 1/1:120,11,0 0/1:91,0,16
In this VCF record, Parent1_sorted.bam has 1/1 (homozygous alternate), and Parent2_sorted.bam has 0/1 (heterozygous), indicating differing genotypes.
Download VCF Reference Filter
(For the first time usage in Windows, select "More Info" and then "Run Anyway" to proceed.)
Resources
Applications
- Genetic Studies: Identifying variants where parents differ can highlight potential points of recombination or informative markers for mapping traits.
- Crop Breeding: Helps identify variants segregating between parents to track inheritance patterns in offspring.
- Medical Genetics: Differentiating between parent genotypes can uncover de novo mutations in progeny or verify parent-child relationships.
- Population Studies: Segregating variants are used to study allele frequency and genetic diversity.
Benefits
- Facilitates the identification of informative markers for genetic mapping.
- Helps in understanding recombination patterns and genetic inheritance.
- Aids in filtering out non-informative variants in large datasets.
Screenshots
Input VCF File
Filter Criteria Interface
Output Filtered VCF
Input Format and Output Structure
The input file should be in .xlsx format. Specify filtering criteria such as quality scores, coverage depth, or specific genomic regions. The software outputs a filtered VCF file containing only the variants that meet the criteria.