Hi all,
I'm trying to merge WGS data for one sample that's been run across 6 lanes.
Now the six individual files all validate (with Picard ValidateSamFile) correctly; the problem comes when I try to merge them.
I originally used Picard MergeSamFiles to merge them but this screwed with the mate-pairs ; then I recently found that most of the GATK pipeline will take multiple input files and merge them into a single output, so I tried this at the step of the GATK IndelRealigner, which I understand corrects mate-pair errors.
Both Picard MergeSamFiles and GATK IndelRealigner have messed up the mate-pair information during the merge. So I wondered if anyone knew of a way to merge the files whilst preserving the mate pair information.
I should point out that I've tried correcting this (4 times) with Picard FixMateInformation and each time it's run into various problems after about 18 hours.
the commands I used were:
java -Xmx24g -Xms24g -jar ~/apps/GATK/GenomeAnalysisTK.jar \
-I nodup17_01.bam \
-I nodup17_02.bam \
-I nodup17_03.bam \
-I nodup17_04.bam \
-I nodup17_05.bam \
-I nodup17_06.bam \
-R ~/gatk/ucsc.hg19.fasta \
-T IndelRealigner \
-targetIntervals nodup17.intervals \
-o realign17.bam \
-nt 12 \
-known ~/gatk/Mills_and_1000G_gold_standard.indels.hg19.vcf \
-known ~/gatk/1000G_phase1.indels.hg19.vcf \
-known ~/gatk/dbsnp_135.hg19.vcf
java -Xmx24g -Xms24g -jar ~/apps/picard/MergeSamFiles.jar \
I=nodup17_01.bam \
I=nodup17_02.bam \
I=nodup17_03.bam \
I=nodup17_04.bam \
I=nodup17_05.bam \
I=nodup17_06.bam \
o=realign17.bam \
USE_THREADING=true \
SO=coordinate \
TMP_DIR=~/local17/ \
CREATE_INDEX=true \
CREATE_MD5_FILE=true
java -Xmx24g -Xms24g -jar ~/apps/picard/FixMateInformation.jar \
INPUT=realign17.bam \
OUTPUT=realign17_FM.bam \
SO=coordinate \
TMP_DIR=local17/ \
CREATE_INDEX=true \
CREATE_MD5_FILE=true
Thanks
Richard
I'm trying to merge WGS data for one sample that's been run across 6 lanes.
Now the six individual files all validate (with Picard ValidateSamFile) correctly; the problem comes when I try to merge them.
I originally used Picard MergeSamFiles to merge them but this screwed with the mate-pairs ; then I recently found that most of the GATK pipeline will take multiple input files and merge them into a single output, so I tried this at the step of the GATK IndelRealigner, which I understand corrects mate-pair errors.
Both Picard MergeSamFiles and GATK IndelRealigner have messed up the mate-pair information during the merge. So I wondered if anyone knew of a way to merge the files whilst preserving the mate pair information.
I should point out that I've tried correcting this (4 times) with Picard FixMateInformation and each time it's run into various problems after about 18 hours.
the commands I used were:
java -Xmx24g -Xms24g -jar ~/apps/GATK/GenomeAnalysisTK.jar \
-I nodup17_01.bam \
-I nodup17_02.bam \
-I nodup17_03.bam \
-I nodup17_04.bam \
-I nodup17_05.bam \
-I nodup17_06.bam \
-R ~/gatk/ucsc.hg19.fasta \
-T IndelRealigner \
-targetIntervals nodup17.intervals \
-o realign17.bam \
-nt 12 \
-known ~/gatk/Mills_and_1000G_gold_standard.indels.hg19.vcf \
-known ~/gatk/1000G_phase1.indels.hg19.vcf \
-known ~/gatk/dbsnp_135.hg19.vcf
java -Xmx24g -Xms24g -jar ~/apps/picard/MergeSamFiles.jar \
I=nodup17_01.bam \
I=nodup17_02.bam \
I=nodup17_03.bam \
I=nodup17_04.bam \
I=nodup17_05.bam \
I=nodup17_06.bam \
o=realign17.bam \
USE_THREADING=true \
SO=coordinate \
TMP_DIR=~/local17/ \
CREATE_INDEX=true \
CREATE_MD5_FILE=true
java -Xmx24g -Xms24g -jar ~/apps/picard/FixMateInformation.jar \
INPUT=realign17.bam \
OUTPUT=realign17_FM.bam \
SO=coordinate \
TMP_DIR=local17/ \
CREATE_INDEX=true \
CREATE_MD5_FILE=true
Thanks
Richard