I am pretty new to manipulating NextGenSeq datasets, so my apologies if this question is trivial!
I want to format my .bam files so I can use them in GATK for SNP calling. Although my .bam files have all the necessary 'read group' information, they are sorted in an lexicographical order - which will not work with GATK.
However, if I resort my .bam so the headers are replaced by the reference using;
samtools import <your reference>.fai <your file>.sam <your file>.sorted_header.bam
then my read group header disappears - and GATK wont work! Most annoying! Does anyone know of a way to either avoid this when resorting, or adding the '@RG' header afterwards?? Thanks
I want to format my .bam files so I can use them in GATK for SNP calling. Although my .bam files have all the necessary 'read group' information, they are sorted in an lexicographical order - which will not work with GATK.
However, if I resort my .bam so the headers are replaced by the reference using;
samtools import <your reference>.fai <your file>.sam <your file>.sorted_header.bam
then my read group header disappears - and GATK wont work! Most annoying! Does anyone know of a way to either avoid this when resorting, or adding the '@RG' header afterwards?? Thanks
Comment