Unconfigured Ad

**WhatsOEver** · 03-27-2015, 03:55 AM

What documentations are you referring to?
Although I agree that information is sometimes hard to find on their website, I personally think you'll have a hard time to find something with a better documented workflow than "GATK variant calling" in the world of bioinformatics.
Have a look at this site which contains highly detailed presentation slides on how to use GATK from Sep 2014. https://www.broadinstitute.org/gatk/...ations?id=4768

**student-t** · 03-27-2015, 11:44 PM

I have read the documentation but still having problems. What I'm doing now is that, I'm simulating variants, mutations etc from a reference genome. I'd expect I'd see something from the GATK tool. I ran through the pipeline but my resulting VCF file contains headers but no data. The problem is that I don't know whether I've screwed up my data or the way I'm using GATK is simply wrong. If there was a known protocol, then I could compare my steps with it.... Can anybody take a quick look at what I've done? It's paired-reads.

Here's my pipeline:

- silico.fa is my reference genome

- Create Index for FASTQ

samtools faidx silico.fa

- Generate a sequence dictionary

java -jar ../../Tools/Picard/picard.jar CreateSequenceDictionary REFERENCE=silico.fa OUTPUT=silico.dict

- Alignment

bwa mem -t 20 -M -R '@RG\tID:group1\tSM:sample1\tPL:illumina\tLB:lib1\tPU:unit1' silico_index reads/simulated_1.fastq reads/simulated_2.fastq > aligned.sam

- Sort SAM to BAM

java -jar ../../Tools/Picard/picard.jar SortSam INPUT=aligned.sam OUTPUT=sorted.bam SORT_ORDER=coordinate

- Mark Duplicates

java -jar ../../Tools/Picard/picard.jar MarkDuplicates INPUT=sorted.bam OUTPUT=marked.bam METRICS_FILE=metrics.txt

- Build BAM Index

java -jar ../../Tools/Picard/picard.jar BuildBamIndex INPUT=marked.bam

- RealignerTargetCreator

java -jar ../../Tools/GATK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R silico.fa -I marked.bam -o realigner.intervals

- IndelRealigner

java -jar ../../Tools/GATK/GenomeAnalysisTK.jar -T IndelRealigner -R silico.fa -I marked.bam -targetIntervals realigner.intervals -o realigned.bam

- Haplotype Analysis

java -jar ../../Tools/GATK/GenomeAnalysisTK.jar -T HaplotypeCaller -R silico.fa -I realigned.bam --genotyping_mode DISCOVERY --heterozygosity 0.01 --defaultBaseQualities 30 -o haplotype.vcf

**dpryan** · 03-28-2015, 06:20 AM

The question becomes how the simulated reads were made.

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 12 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 46 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 106 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

Protocol for finding variants in GATK

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News