Dear All
I am trying to use GATK for multiple samples SNP calling. The requirement of input sequence reads file of GATK is 1) .bam 2) indexed 3) coordinate sorted 4) with read group including PL (platform) and SM (sample tags).
I am writing to ask if it is OKAY for me to use the output bam file of tophat as the input file of GATK.
I utilized the samtools to view the output bam file of tophat. The first several lines are as follows:
HS5_204:6:1206:11794:144840 99 scaffold_1 12632 3 19M117N81M = 12775 243 GTCTGTGATGACCAAAGAGGGGAAGTGGCAAATGTCCTTTGTCATGCCATCCAAGTACGGCGCTAATCTGCCTTTGCCCAAGGATCCAACTGTGAGGGTT @<@DBDDDHFDFBHHIIIICBA@8?:C@DADHIE?:?BE*??DFFBGD>D?FDGICF<@;8:9=AB:@CDCC>CCCCCC@BB?5AC?@@CCA:AC@BBB8 AS:i:-9 XM:i:2 XO:i:0 XG:i:0MD:Z:1C43A54 NM:i:2 XS:A:+ NH:i:2 CC:Z:scaffold_16 CP:i:9083845 HI:i:0
Do anyone have idea if i still need to do the index and sort of the bam file with samtools or picard? If so, that would be a lot of work.
Thanks in advance!
Li
I am trying to use GATK for multiple samples SNP calling. The requirement of input sequence reads file of GATK is 1) .bam 2) indexed 3) coordinate sorted 4) with read group including PL (platform) and SM (sample tags).
I am writing to ask if it is OKAY for me to use the output bam file of tophat as the input file of GATK.
I utilized the samtools to view the output bam file of tophat. The first several lines are as follows:
HS5_204:6:1206:11794:144840 99 scaffold_1 12632 3 19M117N81M = 12775 243 GTCTGTGATGACCAAAGAGGGGAAGTGGCAAATGTCCTTTGTCATGCCATCCAAGTACGGCGCTAATCTGCCTTTGCCCAAGGATCCAACTGTGAGGGTT @<@DBDDDHFDFBHHIIIICBA@8?:C@DADHIE?:?BE*??DFFBGD>D?FDGICF<@;8:9=AB:@CDCC>CCCCCC@BB?5AC?@@CCA:AC@BBB8 AS:i:-9 XM:i:2 XO:i:0 XG:i:0MD:Z:1C43A54 NM:i:2 XS:A:+ NH:i:2 CC:Z:scaffold_16 CP:i:9083845 HI:i:0
Do anyone have idea if i still need to do the index and sort of the bam file with samtools or picard? If so, that would be a lot of work.
Thanks in advance!
Li