Hi,
I just received one ion torrent data to test (sample.fastq). I wanted to have the community's point of view on what I am doing.
1/ aligning the reads
-------------------------
I use:
$bwa -aln hg19ref.fa sample.fastq > sample.sai
$bwa samse -r "@RG\tID:IDa\tSM:sample\tPL:PL" hg19ref.fa sample.sai sample.fastq > sample.sam
$java -Xmx4g -jar picard/SortSam.jar SO=coordinate INPUT=sample.sam OUTPUT=sample.bam VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
looking to flagstat, I get:
428427 in total
0 QC failure
0 duplicates
373491 mapped (87.18%)
RMK/questions:
* should I rather use bwa -bwasw ?? or bowtie2??
* is the platform header information included in the sam used somewhere?
2/ gatk1.3 guideline: realigning reads, etc.
-----------------------------------------------
ooooo Marking PCR duplicates BAM ooooo
java -Xmx4g -jar picard/MarkDuplicates.jar INPUT=sample.bam OUTPUT=sample.marked.bam METRICS_FILE=metrics VALIDATION_STRINGENCY=LENIENT
ooooo Indexing MARKED BAM ooooooooo '
samtools index sample.marked.bam
ooooo Realignment around indels (1) create Indels Table ooooo
java -Xmx4g -jar gatk.jar -T RealignerTargetCreator -R hg19ref.fa -filterMBQ -nt 16 --known GATK_1000G_PHASE1_INDELS --known GATK_MILLS1000G_GOLD_INDELS -o indeltablelist -I sample.marked.bam
ooooo Realignment around indels (2) realigns reads around those targets ooooo
java -Xmx4g -jar gatk.jar -I sample.marked.bam -R hg19ref.fa -filterMBQ -known GATK_1000G_PHASE1_INDELS -known GATK_MILLS1000G_GOLD_INDELS -T IndelRealigner -targetIntervals indeltablelist -o sample.marked.realigned.bam
ooooo Fix mate for pair end reads oooooooo '
java -Xmx4g -jar picard/FixMateInformation.jar INPUT=sample.marked.realigned.bam
OUTPUT=sample.marked.realigned.fixed.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
ooooo Quality score recalibration (1) Countcovariates oooooooo
java -Xmx4g -jar gatk.jar -l INFO -R hg19ref.fa -nt 16 -knownSites:dbsnp,VCF dbsnp_135.b37.vcf -I sample.marked.realigned.fixed.bam -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile sample.recal_data.csv
RMK: This was failing with gatk 1.5 (complaining that the platform was unknown - wanted illumina or 454 or solid). It seems it works with gatk1.6
ooooo Quality score recalibration (2) Tablerecalibration oooooooo '
java -Xmx4g -jar gatk.jar -l INFO -R hg19ref.fa -I sample.marked.realigned.fixed.bam -T TableRecalibration --out sample.marked.realigned.fixed.recal.bam -recalFile sample.recal_data.csv
Does it looks ok?
Then calling the variant using the rest of the pipeline guideline.
any comments?
thanks
tuka
I just received one ion torrent data to test (sample.fastq). I wanted to have the community's point of view on what I am doing.
1/ aligning the reads
-------------------------
I use:
$bwa -aln hg19ref.fa sample.fastq > sample.sai
$bwa samse -r "@RG\tID:IDa\tSM:sample\tPL:PL" hg19ref.fa sample.sai sample.fastq > sample.sam
$java -Xmx4g -jar picard/SortSam.jar SO=coordinate INPUT=sample.sam OUTPUT=sample.bam VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
looking to flagstat, I get:
428427 in total
0 QC failure
0 duplicates
373491 mapped (87.18%)
RMK/questions:
* should I rather use bwa -bwasw ?? or bowtie2??
* is the platform header information included in the sam used somewhere?
2/ gatk1.3 guideline: realigning reads, etc.
-----------------------------------------------
ooooo Marking PCR duplicates BAM ooooo
java -Xmx4g -jar picard/MarkDuplicates.jar INPUT=sample.bam OUTPUT=sample.marked.bam METRICS_FILE=metrics VALIDATION_STRINGENCY=LENIENT
ooooo Indexing MARKED BAM ooooooooo '
samtools index sample.marked.bam
ooooo Realignment around indels (1) create Indels Table ooooo
java -Xmx4g -jar gatk.jar -T RealignerTargetCreator -R hg19ref.fa -filterMBQ -nt 16 --known GATK_1000G_PHASE1_INDELS --known GATK_MILLS1000G_GOLD_INDELS -o indeltablelist -I sample.marked.bam
ooooo Realignment around indels (2) realigns reads around those targets ooooo
java -Xmx4g -jar gatk.jar -I sample.marked.bam -R hg19ref.fa -filterMBQ -known GATK_1000G_PHASE1_INDELS -known GATK_MILLS1000G_GOLD_INDELS -T IndelRealigner -targetIntervals indeltablelist -o sample.marked.realigned.bam
ooooo Fix mate for pair end reads oooooooo '
java -Xmx4g -jar picard/FixMateInformation.jar INPUT=sample.marked.realigned.bam
OUTPUT=sample.marked.realigned.fixed.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
ooooo Quality score recalibration (1) Countcovariates oooooooo
java -Xmx4g -jar gatk.jar -l INFO -R hg19ref.fa -nt 16 -knownSites:dbsnp,VCF dbsnp_135.b37.vcf -I sample.marked.realigned.fixed.bam -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile sample.recal_data.csv
RMK: This was failing with gatk 1.5 (complaining that the platform was unknown - wanted illumina or 454 or solid). It seems it works with gatk1.6
ooooo Quality score recalibration (2) Tablerecalibration oooooooo '
java -Xmx4g -jar gatk.jar -l INFO -R hg19ref.fa -I sample.marked.realigned.fixed.bam -T TableRecalibration --out sample.marked.realigned.fixed.recal.bam -recalFile sample.recal_data.csv
Does it looks ok?
Then calling the variant using the rest of the pipeline guideline.
any comments?
thanks
tuka
Comment